图像
图像

智能体人工智能的关键贡献者

Key Contributors to Agentic Artificial Intelligence

本书是人工智能领域一些最杰出专家通力合作的成果——人工智能正迅速重塑着技术和商业格局。本书的撰稿人背景多元,包括人工智能研究人员、企业高管、高级开发人员以及在全球各行各业拥有人工智能代理实施经验的实战顾问。他们丰富的专业知识,涵盖深厚的技术底蕴、丰富的实践经验和独到的商业洞察,对本书的深度和视野的形成至关重要。

This book is the result of a unique collaboration among some of the brightest minds in agentic AI—a field that is rapidly reshaping technology and business. The contributors to this book come from diverse backgrounds, including AI researchers, business executives, high-level developers, and hands-on consultants who have implemented AI agents across industries worldwide. Their collective expertise, spanning deep technical knowledge, real-world implementation experience, and strategic business insights, has been essential in shaping this book’s depth and vision.

以下按姓氏字母顺序排列贡献者名单:

Below, the contributors are listed in alphabetical order by last name:

伊恩·巴金

Ian Barkin

皮埃尔·路易·布沙尔

Pierre Louis Bouchard

尼古拉斯·克拉维诺

Nicholas Cravino

达娜·达赫

Dana Daher

西蒙·埃利斯

Simon Ellis

安迪·范宁

Andy Fanning

奥利维尔·戈麦斯

Olivier Gomez

基兰·吉尔默里

Kieran Gilmurray

莫辛·汗

Mohsin Khan

卡西·科兹尔科夫

Cassie Kozyrkov

马克西姆·约菲

Maxim Ioffe

南丹·穆拉卡拉

Nandan Mullakara

阿诺·莫尔万

Arnaud Morvan

拉姆纳特·纳塔拉詹

Ramnath Natarajan

扬·奥伯豪瑟

Jan Oberhauser

拉塞·林多姆

Lasse Rindom

托兰·布鲁斯·理查兹

Toran Bruce Richards

沙尔布斯·沙亚

Sharbs Shaaya

普贾·桑德

Pooja Sund

本书的每一位作者都贡献了独特的视角、深厚的技术功底和丰富的实践经验,帮助我们不仅探讨了人工智能代理的本质,还探讨了它们在现实世界中的构建、部署和扩展方式。在此,我要向各位致以衷心的感谢,感谢你们的宝贵贡献。

Each of these individuals has brought unique perspectives, technical depth, and practical expertise to this book, helping to explore not just what AI agents are, but how they are being built, deployed, and scaled in the real world. To all of you—thank you for your invaluable contributions.

版权所有 © 2025 Pascal Bornet、Jochen Wirtz、Thomas H. Davenport、David De Cremer、Brian Evergreen、Phil Fersht、Rakesh Gohel 和 Shail Khiyara

Copyright © 2025 by Pascal Bornet, Jochen Wirtz, Thomas H. Davenport, David De Cremer, Brian Evergreen, Phil Fersht, Rakesh Gohel, and Shail Khiyara

版权所有。未经出版商事先书面许可,不得以任何形式或任何方式(包括影印、录音或其他电子或机械方法)复制、分发或传播本出版物的任何部分,但版权法允许的某些非商业用途以及评论性文章中的简短引文除外。如需获得许可,请致函以下出版商地址。

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other non-commercial uses permitted by copyright law. For permission requests, write to the publisher at the inquiry address below.

ISBN 979-8-9928336-0-7(精装本)

ISBN 979-8-9928336-0-7 (Hardcover)

ISBN 979-8-9928336-1-4(电子书)

ISBN 979-8-9928336-1-4 (Ebook)

ISBN 979-8-9928336-2-1(有声读物)

ISBN 979-8-9928336-2-1 (Audiobook)

此版本:完整版 - 电子书 - 英文 - 2025年3月

This version: Full - Ebook - EN - March 2025

所有咨询请发送至pascalbornet@gmail.com

All inquiries are to be sent to pascalbornet@gmail.com

“智能代理正在带来自我们从输入命令到点击图标以来计算机领域最大的革命。”——比尔·盖茨

“Agents are (…) bringing about the biggest revolution in computing since we went from typing commands to tapping on icons.” — Bill Gates

“人工智能代理将成为我们未来与计算机交互的主要方式。”——萨蒂亚·纳德拉

“AI agents will become the primary way we interact with computers in the future.” — Satya Nadella

“人工智能自主化的时代已经到来”——黄仁勋

“The age of agentic AI is here”— Jensen Huang

谨以此书献给我们的孩子,也献给全世界所有的孩子们。

We dedicate this book to our children and to all the children in the world.

我们有责任为他们创造最好的未来。

We owe them the best future.

内容

CONTENTS

序言:迈向人类潜能之旅

PREFACE: A Journey Toward Human Potential

介绍

INTRODUCTION

人工智能代理的优势(及局限性)

The Promise (and Limitations) of AI Agents

你将从这本书中学到什么

What You Will Learn from the Book

超越书籍:您的在线资源

Beyond the Book: Your Online Resources

理解人工智能代理的关键术语

Key Terminologies for Understanding AI Agents

第一部分:人工智能代理的崛起

PART 1: THE RISE OF AI AGENTS

第一章:超越 ChatGPT:人工智能的下一个演进阶段

CHAPTER 1: Beyond ChatGPT: The Next Evolution of AI

智能体人工智能的诞生:力量的融合

The Birth of Agentic AI: A Convergence of Powers

面向创业和商业的智能体人工智能

Agentic AI for Entrepreneurship and Business

企业人工智能代理采用现状

The State of AI Agent Adoption in Companies

第二章:人工智能代理的五个层次:从自动化到自主性

CHAPTER 2: The Five Levels of AI Agents: From Automation to Autonomy

人工智能代理能力解析

Breaking Down the AI Agent’s Capabilities

人工智能代理能力的复杂现实

The Complex Reality of AI Agents’ Capabilities

智能体人工智能发展框架

The Agentic AI Progression Framework

渐进式自主的魔力:理解人工智能代理级别

The Magic of Progressive Autonomy: Understanding AI Agent Levels

第三章:人工智能代理的内心世界

CHAPTER 3: Inside the Mind of an AI Agent

人工智能代理的关键特性

Key Specificities of AI Agents

人工智能代理的固有局限性

Inherent Limitations of AI Agents

当一个智能体不足以满足需求时:多智能体系统的力量与实践

When One Is Not Enough: The Power and Practice of Multi-Agent Systems

代理人的困境:如何在创造力和可靠性之间取得平衡

The Agent’s Dilemma: Balancing Creativity with Reliability

第四章:人工智能代理的测试

CHAPTER 4: Putting AI Agents to the Test

数字之手:当人工智能学会使用电脑

Digital Hands: When AI Learned to Use Computers

我们迈出使用“计算机应用”人工智能代理的第一步:发票测试

Our First Steps with a “Computer Use” AI Agent: The Invoice Test

当人工智能遇到回形针挑战

When AI Meets the Paperclip Challenge

实验结果

The outcome of the experiment

从实验中吸取的教训

Lessons learned from the experiments

第二部分:智能体人工智能的三大基石

PART 2: THE THREE KEYSTONES OF AGENTIC AI

第五章:行动:教人工智能行动,而不仅仅是思考

CHAPTER 5: Action: Teaching AI to Do, Not Just Think

侦探的困境

The Detective’s Dilemma

工具作为构建模块

Tools as Building Blocks

AI代理工具包内部

Inside the AI Agent’s Toolkit

从基础到高级的工具使用

From Basic to Advanced Tool Usage

当工具遇上信任

When Tools Meet Trust

第六章:推理:从快速到明智

CHAPTER 6: Reasoning: From Fast to Wise

人工智能推理:暂停的力量

AI Reasoning: Introducing The Power of Pause

众人拾柴火焰高:人工智能推理中的多智能体系统

The Power of Many: Multi-Agent Systems in AI Reasoning

第七章:记忆:构建会学习的人工智能

CHAPTER 7: Memory: Building AI That Learns

记忆是智力的基础

Memory is a Foundation of Intelligence

人工智能代理中短期记忆的复杂运作

The Intricate Dance of Short-Term Memory in AI Agents

长期记忆的力量:将人工智能从工具转变为合作伙伴

The Power of Long-Term Memory: Transforming AI from Tools to Partners

在智能体人工智能系统中设计和实现长期记忆

Designing and Implementing Long-Term Memory in Agentic AI Systems

通过反馈循环进行适应和学习

Adaptation and Learning through Feedback Loops

代理内存管理最佳实践

Best Practices in Managing Memory for Agents

第三部分:人工智能代理助力创业与职业发展

PART 3: ENTREPRENEURSHIP AND PROFESSIONAL GROWTH WITH AI AGENTS

第八章:构建成功人工智能代理的实用指南

CHAPTER 8: A Practical Guide For Building Successful AI Agents

第一步:寻找合适的代理机会

Step 1: Finding the Right Agentic Opportunities

步骤二:定义人工智能代理的角色和能力

Step 2: Defining AI Agents’ Role and Capabilities

步骤三:设计成功的AI代理

Step 3: Designing AI Agents for Success

第四步:实施您的人工智能代理

Step 4: Implementing Your AI Agents

第九章:从创意到收入:代理经济的商业模式

CHAPTER 9: From Ideas to Income: Business Models for the Agent Economy

自主运营企业的诞生:当人工智能成为企业家

The Birth of Self-Running Businesses: When AI Became an Entrepreneur

智能体人工智能时代的新兴商业模式

Emerging Business Models in the Age of Agentic AI

在智能体人工智能经济中创造机遇:新应用淘金热

Building Opportunities in the Agentic AI Economy: The New App Gold Rush

第四部分:通过智能体人工智能实现企业转型

PART 4: ENTERPRISE TRANSFORMATION THROUGH AGENTIC AI

第十章:人机协作:领导力、信任与变革

CHAPTER 10: Human-Agent Collaboration: Leadership, Trust, and Change

大规模掌握工作设计和变革管理

Mastering Work Design and Change Management at Scale

人工智能时代的领导力:在混合团队中建立信任与协作

Leadership in the Age of AI Agents: Building Trust and Collaboration in Hybrid Teams

基金会:管理愿景与治理

The Foundation: Management Vision and Governance

第十一章:人工智能代理的规模化:从愿景到现实

CHAPTER 11: Scaling AI Agents: From Vision to Reality

正确的扩展方法

The Right Scaling Approach

自动化体验优势:从二级代理到三级代理

The Automation Experience Advantage: from Level 2 to Level 3 agents

利用生成式人工智能和人工智能代理实现全面的人工智能企业转型

Leveraging Generative AI and AI Agents for a Holistic AI Corporate Transformation

当智能体失控时:为人工智能系统构建必要的安全保障

When Agents Go Rogue: Building Essential Safeguards for AI Systems

第十二章:跨行业代理的案例研究和应用案例

CHAPTER 12: Case Study and Use Cases of Agents Across Industries

案例研究:引领企业人工智能代理转型:Pets at Home

Case Study: Pioneering Enterprise AI Agent Transformation: Pets at Home

跨职能和行业的智能体应用案例

Agentic Use Cases Across Functions and Industries

第五部分:工作与社会的未来展望

PART 5: FUTURE HORIZONS FOR WORK AND SOCIETY

第十三章:新的工作世界

CHAPTER 13: The New World of Work

工作重塑:人与机器的交响曲

Work Reimagined: The Symphony of Human and Machine

这次不一样了:智能人工智能的黎明

This Time Is Different: The Dawn of Agentic AI

人工智能时代教育的革新

Reinventing Education in the Age of AI Agents

第十四章:代理人时代的社会

CHAPTER 14: Society in the Age of Agents

在智能体驱动的世界中重新构想人类潜能

Reimagining Human Potential in an Agent-Powered World

构建未来智能人工智能治理框架

A Framework for Governing the Future of Agentic AI

结论

CONCLUSION

下一个视野:新兴能力

The Next Horizon: Emerging Capabilities

人工智能治理的紧迫性:趁现在还来得及,建立防护措施

The Urgency of AI Governance: Building Guardrails Before It’s Too Late

反思与更广泛的意义

Reflection and Broader Implications

你的行动计划

Your Action Plan

选择的力量

The Power of Choice

关于智能体的更多资源

More Resources on Agentic AI

作者简介

About the Authors

附录:实用资源

APPENDICES: Practical Resources

第二章 - 从人工智能代理发展框架视角看当前产品格局

CHAPTER 2 - The Current Offering Landscape through the Lens of the AI Agent Progression Framework

第 8 章 - AI 代理身份示例:我们的新闻简报摘要代理

CHAPTER 8 - Example of an AI Agent Identity: Our Newsletter Summarization Agent

第八章 - 简报项目代理的错误处理流程示例

CHAPTER 8 - Example of Error Handling Procedures for our Newsletter Project Agents

第八章 - 使用低代码平台实现代理的示例

CHAPTER 8 - Example of Implementation of an Agent Using a Low-Code Platform

第十二章 – 用例:企业级人工智能代理应用程序

CHAPTER 12 – Use Cases: Enterprise AI Agent Application

第十二章 – 应用案例:个人生产力人工智能代理应用

CHAPTER 12 – Use Cases: Personal Productivity AI Agent Applications

指数

INDEX

序言:通往人类潜能的旅程

PREFACE: A JOURNEY TOWARD HUMAN POTENTIAL

T我们工作、生活和创造价值的方式正在发生深刻的变革。虽然许多人将其视为一场纯粹的技术革命,但我们看到了更深远的意义:它提供了一个重新定义人与机器关系的机会,从而强化我们作为人类的独特特质。

There’s a profound transformation happening in how we work, live, and create value. While many see this as a purely technological revolution, we see something far more meaningful: an opportunity to redefine the relationship between humans and machines in ways that amplify what makes us uniquely human.

我们是由27位专业人士组成的多元化团队,成员来自商业、学术界、编程和研究等领域,我们拥有共同的愿景:科技如何服务于人类。我们的背景涵盖广泛,从实施企业级自动化系统到人工智能领域的开拓性研究,从为财富500强企业提供咨询服务到研究技术变革的社会影响。将我们凝聚在一起的不仅是我们的专业知识,更是我们共同的信念:科技应该增强人类的潜能,而不是取代人类。

We are a diverse team of twenty-seven professionals spanning business, academia, programming, and research, united by a shared vision of how technology can serve humanity. Our backgrounds range from implementing enterprise-scale automation systems to pioneering research in artificial intelligence, consulting with Fortune 500 companies, and studying the societal implications of technological change. What brings us together isn’t just our expertise—it’s our shared belief that technology should enhance human potential rather than replace it.

本书的创作历程始于多年前,尽管当时我们并未意识到这一点。我们中的许多人都是全球大型企业中最早部署智能自动化系统的团队之一。我们率先将人工智能与机器人流程自动化(RPA)相结合,创建出能够处理日益复杂的端到端业务流程的系统。这项工作促使我们中的一些人在 2020 年共同撰写了《智能自动化》书,该书成为全球畅销书,并帮助组织重新思考其数字化转型方法。

Our journey to this book began years ago, though we didn’t know it at the time. Many of us were among the first to implement intelligent automation systems in major organizations worldwide. We pioneered approaches to combine artificial intelligence with robotic process automation (RPA), creating systems that could handle increasingly complex end-to-end business processes. This work led some of us to co-author Intelligent Automation in 2020,1 which became a global bestseller and helped organizations rethink their approach to digital transformation.

当时我们并未意识到,我们正在为更具变革性的事物奠定基础。过去十五年来,我们构建的智能自动化系统——将流程自动化与人工智能相结合,以处理结构化工作流程——已成为当今智能体系统的基础。这一发展历程完全合乎逻辑:一个系统要像智能体那样自主运行,就必须掌握在既定参数范围内执行流程、处理数据和做出决策的基本技能。而这些正是我们多年来在智能自动化系统中不断完善的能力。

We didn’t realize then that we were laying the groundwork for something even more transformative. The intelligent automation systems we built over the past fifteen years—which combine process automation with artificial intelligence to handle structured workflows—have become the foundation for today’s agentic systems. The progression makes perfect sense: before a system can act autonomously (as agents do), it needs to master the basics of executing processes, handling data, and making decisions within defined parameters. These are exactly the capabilities we’ve spent years refining in intelligent automation systems.

这一基础使我们在生成式人工智能的最新突破开启现代智能体系统大门时拥有了独特的优势。我们已经积累了应对诸多基础性挑战的经验:如何可靠地自动化复杂流程,如何优雅地处理异常情况,如何与现有系统集成,以及最重要的是,如何以增强而非取代人类能力的方式来实施这些技术。几年前,当企业开始探索智能体系统时,许多企业自然而然地从现有的智能自动化平台发展而来,并在这些成熟的基础上构建出更复杂、更自主的功能。

This foundation gave us a unique advantage when the latest breakthroughs in generative AI opened the door to modern agentic systems. We had already gained experience with many of the fundamental challenges: how to reliably automate complex processes, how to handle exceptions gracefully, how to integrate with existing systems, and most importantly, how to implement these technologies in ways that enhance rather than replace human capabilities. When companies began exploring agentic systems a few years ago, many naturally evolved from their existing intelligent automation platforms, building upon these proven foundations to create more sophisticated, autonomous capabilities.

然而,我们以谦逊的态度对待这个话题。尽管我们拥有丰富的经验——或许正因如此——我们意识到我们仍在学习。这个领域发展日新月异,新的可能性几乎每天都在涌现。我们贡献的独特之处不仅在于我们的技术或商业专长,还在于……了解如何运用这些技术来促进人类繁荣发展。

Yet, we approach this topic with humility. Despite our collective experience—or perhaps because of it—we recognize that we’re all still learning. The field is evolving rapidly, and new possibilities emerge almost daily. What makes our contribution unique is not just our technical or business expertise but also our understanding of how to implement these technologies in ways that serve human flourishing.

我们的目标不仅仅是解释新技术——我们希望为个人和企业提供构建更美好世界的工具。在这个世界里,劳动者拥有更有意义的工作和更好的工作生活平衡;企业运营效率更高,同时提供卓越的客户体验;医疗系统通过更智能的护理协调挽救更多生命;学校为每位学生提供个性化、高效的学习;社区能够更智能地利用资源,应对复杂的挑战。人工智能不仅仅是自动化——它能够在最关键的领域创造真正的影响。

Our goal isn’t just to explain new technology—we want to give people and businesses the tools to build a better world. A world where workers have more meaningful jobs and a better work-life balance, where companies operate more efficiently while delivering exceptional customer experiences. A world where healthcare systems save more lives through smarter care coordination and schools provide personalized, effective learning for every student. A world where communities can solve complex challenges by using resources more intelligently. AI isn’t just about automation—it’s about creating real impact where it matters most.

本书专为领导者、专业人士、企业家以及所有感知到未来变革巨大影响并渴望了解如何应对这些变革的求知者而作。无论您是希望变革组织的商界高管,还是正在思考职业未来发展的专业人士,亦或是仅仅对科技如何重塑世界感兴趣的人,本书都将为您带来启发。

This book is written for leaders, professionals, entrepreneurs, and curious minds who sense the magnitude of the changes ahead and want to understand how to navigate them. So, whether you’re a business executive looking to transform your organization, a professional wondering about the future of your career, or simply someone interested in how technology will reshape our world, we wrote this book for you.

我们相信我们正处于历史的关键时刻——我们对于如何实施和指导这些技术所做的决定,将对未来几代人产生深远的影响。

We believe we’re at a pivotal moment in history—one where the decisions we make about how to implement and direct these technologies will have far-reaching implications for generations to come.

在本书中,我们将分享我们从成功和失败中学到的经验教训,我们观察到的跨行业模式,以及我们认为在这个新时代蓬勃发展至关重要的原则。

Through these pages, we’ll share what we’ve learned from our successes and failures, the patterns we’ve observed across industries, and the principles we believe will be crucial for thriving in this new era.

让我们一起踏上这段探索之旅,不仅要考虑技术的可能性,更要展望技术能够帮助我们成为什么样的人。

Let’s embark on this exploration together, guided not just by technological possibility, but by a vision of what technology can help us become.

——作者

---The Authors

2025年3月

March 2025

介绍

INTRODUCTION

我们是否误解了生成式人工智能的真正意义

Are We Missing the Point with Generative AI?

P想象一下:你的竞争对手刚刚宣布,他们仅用五分之一的团队就完成了所有运营工作,但他们的增长速度却是你的两倍。他们的秘诀是什么?他们部署了人工智能代理,这些代理可以自主处理从客户服务到运营的一切事务,几个小时就能完成你的团队需要几周才能完成的工作。

Picture this: Your competitor just announced they’re running their entire operation with a team one-fifth the size of yours, yet they’re growing twice as fast. Their secret? They’ve deployed AI agents that autonomously handle everything from customer service to operations, achieving in hours what takes your team weeks.

听起来匪夷所思?但它正在发生。让我们大胆设想一下。当大多数企业还在摸索如何使用 ChatGPT 来撰写邮件和创建聊天机器人时,一种新型组织正在从根本上重新定义人工智能的可能性。他们不仅仅是在自动化任务——他们正在创建能够轻松扩展、持续适应且永不停歇的自主运营企业。

Sounds far-fetched? It’s happening right now. Let us be provocative here. While most businesses are still figuring out how to use ChatGPT for writing emails and creating chatbots, a new breed of organizations is fundamentally reimagining what’s possible with AI. They’re not just automating tasks—they’re creating self-operating businesses that scale effortlessly, adapt continuously, and never sleep.

但阻碍大多数组织发展的悖论在于:我们构建了能够进行卓越思考的生成式人工智能系统,但它们实际上却什么也做不了。它们可以在几秒钟内分析复杂数据,撰写引人入胜的演示文稿,并就任何主题提供精辟的见解。然而,它们却无法按下按钮,发送……发邮件或简单预订即可。我们打造了一个拥有众多才华横溢的顾问的世界,但他们却什么也帮不上忙。

But here’s the paradox that’s holding most organizations back: We’ve built generative AI systems that can think brilliantly but can’t actually do anything. They can analyze complex data in seconds, write compelling presentations, and offer brilliant insights on any topic. Yet they can’t press a button, send an email, or make a simple reservation. We’ve created a world of brilliant advisors who can’t lift a finger to help.

这种情况不仅效率低下,而且有害。在各行各业的会议室和办公室里,我们目睹了一种令人担忧的趋势:人工智能的思考和分析能力越强,人类就越被迫处理机械的、重复性的工作。知识工作者现在高达 60% 的时间都花在了“与工作相关的工作”上——在系统之间复制数据、核实人工智能生成的内容,以及手动执行人工智能生成的建议

This situation isn’t just inefficient—it’s actively harmful. In boardrooms and offices across industries, we’re witnessing an alarming trend: The more sophisticated AI becomes at thinking and analyzing, the more humans are forced to handle mechanical, repetitive tasks. Knowledge workers now spend up to 60% of their time on “work about work”—copying data between systems, fact-checking AI-generated content, and manually executing what generative AI recommends.2

正如我们的合著者之一大卫经常说的那样:“我们把人类当作机器人,把人工智能当作创意人员。是时候改变这种观念了。”

As David, one of our co-authors, often says: “We’re treating humans like robots and AI like creatives. It’s time to flip the equation.”

凭借我们数十年来在全球各地企业实施人工智能解决方案的经验,我们发现这种模式以惊人的频率反复出现。企业斥资数百万美元研发尖端人工智能技术,却发现员工花费在管理这些系统上的时间远多于从事真正有意义的工作。机器在做梦,而人类却在辛勤劳作。

Through our decades of experience implementing AI solutions in organizations worldwide, we’ve seen this pattern repeat with alarming consistency. Companies invest millions in cutting-edge AI only to find their employees spending more time managing these systems than doing meaningful work. The machines dream while humans grind.

我们怎么会走到今天这一步?更重要的是,我们该如何解决这个问题?

How did we end up here? And, more importantly, how do we fix it?

以下三个故事均取材于真实经历,既展现了当前生成式人工智能系统的巨大潜力,也揭示了其关键局限性。它们揭示了传统方法为何失效,并指出我们需要从根本上转变对人工智能的思考方式——这种转变或许最终能够弥合人工智能的思考能力与行动能力之间的鸿沟。

The following three stories, drawn from real experiences, illuminate both the promise and the critical limitations of current generative AI systems. They reveal why traditional approaches are failing and point toward a fundamental shift in how we need to think about artificial intelligence—one that could finally bridge the gap between AI’s ability to think and its ability to act.

阅读这些故事时,你很可能会想起自己使用生成式人工智能的经历。更重要的是,你会开始明白,人工智能的下一个发展阶段并非让机器更聪明,而是让它们更有能力自主行动。

As you read these stories, they will likely resonate with your own experiences with generative AI. More importantly, you’ll begin to understand why the next evolution in artificial intelligence isn’t about making machines smarter—it’s about making them more capable of autonomous action.

家庭度假:当机器做梦,人类辛勤劳作

The Family Vacation: When Machines Dream and Humans Grind

周六傍晚,夜色渐渐消逝,布莱恩的笔记本电脑屏幕发出柔和的光芒,照亮了他的客厅。屋里静悄悄的——孩子们终于在睡前讨价还价后入睡,妻子则在楼上看书。他心想,现在正是计划期待已久的希腊家庭旅行的绝佳时机。自从孩子们在学校迷上希腊神话以来,他们就一直答应带他们去希腊。

The soft glow of Brian’s laptop illuminated his living room as Saturday evening melted into the night. The house was quiet—his kids finally asleep after their usual bedtime negotiations, his wife reading upstairs. The perfect time, he thought, to plan their long-awaited family vacation to Greece. A trip they’d been promising the kids ever since they’d become obsessed with Greek mythology at school.

布莱恩打开ChatGPT时,时间显示晚上8点37分。他坐在电脑前,决心规划一次完美的希腊家庭旅行。有了最新的AI技术,他相信这会既快捷又轻松。

When Brian opened ChatGPT, the clock read 8:37 PM. He sat down at his computer, determined to plan the perfect family vacation to Greece. Armed with the latest AI technology, he felt confident this would be quick and easy.

“给我推荐一个适合四口之家在希腊的两周行程,”他在 ChatGPT 中输入,并补充了孩子们对希腊神话的兴趣。几秒钟之内,人工智能就生成了一份杰作——一份精心设计的行程,其中包含了许多鲜为人知的景点、当地特色体验以及为他一家量身定制的贴心细节:

“Show me a two-week itinerary for a family of four in Greece,” he typed into ChatGPT, adding details about his children’s interests in Greek mythology. Within seconds, the AI produced a masterpiece—a perfectly crafted itinerary filled with hidden gems, local experiences, and thoughtful touches tailored to his family:

“第1-3天:雅典。清晨前往卫城,避开人潮。孩子们一定会对卫城博物馆的互动展览着迷……午餐在迷人的普拉卡区一家名为Taverna Platanos的家庭式餐厅享用,餐厅庭院里弥漫着茉莉花的芬芳……”

“Day 1-3: Athens. Begin at the Acropolis during early morning hours to avoid crowds. Your children will be captivated by the interactive exhibits at the Acropolis Museum... Lunch at the family-run Taverna Platanos in the charming Plaka district, where the courtyard fills with the scent of jasmine...”

人工智能的建议令人印象深刻,甚至考虑到了他儿子喜欢画古代建筑和女儿对神话的痴迷。当布莱恩要求提供每小时的详细行程安排时,人工智能以惊人的精准度满足了他的要求,包括最佳拍照时间和完美的休息时间。

The AI’s suggestions were impressive, even accounting for his son’s love of drawing ancient buildings and his daughter’s fascination with mythology. When Brian asked for an hour-by-hour breakdown, the AI obliged with remarkable precision, including optimal photo opportunities and perfectly timed rest breaks.

但随着时间一分一秒地过去,过了晚上十点,布莱恩的惊讶变成了沮丧。“迷人的家庭式酒店”?永久关闭了。“隐秘海滩”?在任何地图上都找不到。传统烹饪课?六个月内都已预订满了。

But as the clock ticked past 10 PM, Brian’s amazement turned to frustration. The “charming family-run” hotel? Permanently closed. The “hidden beach”? Impossible to find on any map. The traditional cooking class? Booked solid for six months.

晚上11点半,布莱恩的办公桌简直像个犯罪现场:几十个浏览器标签页、好几个记录航班选择的电子表格、酒店房间的截图,还有旅行社发来的PDF文件。人工智能精心设计的行程单静静地躺在文档里,而布莱恩则忙着做真正的工作——查询空房情况、比较价格,努力把人工智能的完美幻想变成可以预订的现实。

By 11:30 PM, Brian’s desk resembled a crime scene investigation: dozens of browser tabs, multiple spreadsheets tracking flight options, screenshots of hotel rooms, and PDFs from tour companies. The AI’s beautiful itinerary sat uselessly in a document while Brian did the real work—checking availability, comparing prices, and attempting to turn the AI’s perfect fantasy into bookable reality.

“我本来很想花一晚上时间想象我们将要去的地方,”布莱恩后来回忆说,“结果,我却花了几个小时处理那些我以为人工智能应该能搞定的繁琐后勤工作。”

“I would have loved to spend my evening imagining the places we’d visit,” Brian reflected later. “Instead, I spent hours doing the tedious logistics that I thought AI was supposed to handle.”

他的经历生动地展现了我们许多人对人工智能的期望与实际得到的差距。我们希望技术能够处理那些繁琐的部分——无休止地浏览航班选项、交叉比对酒店评价,以及在数十个预订系统中查找空房的枯燥工作。然而,人工智能却在规划过程中那些令人愉悦的部分表现出色——构思各种可能性、推荐精彩的旅行、描绘完美的场景——而将所有实际细节都留给了人类处理。

His experience crystallizes what so many of us expect from AI versus what we actually get. We want technology to handle the tedious parts—the endless browsing of flight options, the cross-referencing of hotel reviews, and the mind-numbing task of finding availability across dozens of booking systems. Instead, AI has become remarkably good at the enjoyable parts of planning—dreaming up possibilities, suggesting adventures, painting pictures of perfect moments—while leaving humans to handle all the practical details.

布莱恩当然明白其中的讽刺意味。这可是世界上最先进的人工智能系统之一,它能写诗,能解释量子物理,却连查酒店是否还在营业这种基本任务都做不到。它能构思出完美的假期计划,却连一张机票都订不了。

The irony wasn’t lost on Brian. Here was one of the most advanced AI systems in the world, capable of writing poetry and explaining quantum physics, yet it couldn’t perform the basic task of checking if a hotel was still in business. It could dream up the perfect vacation but couldn’t book a single flight.

布莱恩终于在凌晨一点上床睡觉了,但他什么也没订。他的浏览器历史记录说明了一切:访问了47个不同的网站,进行了数十次搜索,并在各种预订平台上放弃了多个购物车。人工智能精心设计的完美行程静静地躺在他电脑桌面的一个文档里,精美却毫无用处,就像一本来自平行宇宙的旅游杂志,在那里一切都如他想象的那样完美。

Brian finally went to bed at 1 AM, having booked nothing. His browser history told the story: 47 different websites visited, dozens of searches, and multiple abandoned shopping carts on various booking platforms. The AI’s perfect itinerary sat in a document on his desktop, beautiful but useless, like a travel magazine from an alternate reality where everything works exactly as imagined.

***

***

布莱恩的度假计划经历反映了我们在各个行业和应用领域反复观察到的一种模式。无论是专业人士试图组织复杂的项目时间表,高管协调多团队项目,还是企业家尝试推出新产品,我们都观察到了同样的局限性。在每一种情况下,如今的生成式人工智能系统都展现出了卓越的能力,同时也存在着令人沮丧的局限性。

Brian’s experience with vacation planning reflects a pattern we’ve seen repeatedly across industries and applications. We’ve observed the same limitations, whether it is professionals trying to organize complex project timelines, executives coordinating multi-team initiatives, or entrepreneurs attempting to launch new products. In each case, today’s generative AI systems demonstrate both remarkable capabilities and frustrating limitations.

就像一群才华横溢的顾问却无法落实自己的建议一样,这些系统在战略和创意领域表现出色:制定战略、制定详细计划、理解复杂需求、提供个性化建议以及构建引人入胜的故事。然而,它们却严重缺乏真正能够带来变革的实用能力。

Similar to a room filled with brilliant advisors who are unable to implement their own recommendations, these systems shine in the strategic and creative domains: generating strategies, formulating detailed plans, comprehending complex requirements, offering personalized advice, and crafting compelling narratives. Yet they crucially lack the practical capabilities that would make them truly transformative:

在现实世界中执行实际行动的能力

Capability to execute actual actions in the real world

能够核实和更新实时信息

Ability to verify and update real-time information

有能力在面临变化的情况时调整计划

Power to adapt plans when faced with changing conditions

具备在一段时间内持续采取行动以实现目标的能力

Capacity to maintain consistent action over time to achieve a goal

我们当前生成式人工智能领域尤其令人担忧的是一个鲜有人意识到的深刻讽刺:人工智能的发展恰恰是为了擅长错误的事情。

What’s particularly troubling about our current generative AI landscape is a profound irony that few have recognized: AI has evolved to excel at precisely the wrong things.

想想是什么让我们兴奋,是什么让我们成为独一无二的人类——创造力、深厚的人际关系和批判性思维。这些正是成就感、创新和进步的源泉。然而,如今的生成式人工智能在这方面表现卓越。它可以撰写精彩的营销文案,构思突破性的产品创意,甚至进行复杂的分析。与此同时,人类却越来越难以做到这一点。沦为数据录入、后续跟进和数字管理——这些枯燥乏味的任务本应由人工智能来处理。

Think about what excites us and makes us uniquely human—creativity, deep connections, and critical thinking. These are the tasks that fuel fulfillment, innovation, and progress.3 Yet, today’s generative AI excels at them. It can craft brilliant marketing copy, dream up groundbreaking product ideas, and even engage in sophisticated analysis. Meanwhile, humans are increasingly reduced to data entry, follow-ups, and digital housekeeping—the kind of mind-numbing tasks AI should be handling.

这种角色互换——人类成为连接各种系统的“机器人”,而人工智能则负责构想各种可能性——表明我们在人工智能的运用方式上存在根本性的错位。但正如我们在研究领域发现的那样,人工智能的能力与现实世界的需求之间的这种错位可能会产生更为深远的影响……

This role reversal—where humans become “the robots” connecting various systems while AI dreams up possibilities—points to a fundamental misalignment in our approach to artificial intelligence. But as we discovered in the research world, this misalignment between AI’s capabilities and real-world needs could have far more profound consequences…

当人工智能遭遇现实:来自科研领域的警示故事

When AI Met Reality: A Cautionary Tale from the Research World

以下故事根据真实事件改编。为保护隐私,文中人名和具体细节均已更改。

The following story is based on actual events. Names and specific details have been changed to protect confidentiality.

杰西卡·英博士难以置信地盯着电脑屏幕。48小时后,她本应在联合国气候峰会上发表关于气候变化对全球粮食安全影响的突破性研究成果。她的研究成果有望影响国际政策和数十亿美元的农业投资。但当她审阅研究团队准备的草稿时,她的心沉了下去。

Dr. Jessica Ying stared at her computer screen in disbelief. In forty-eight hours, she was supposed to present groundbreaking research on climate change’s impact on global food security at the UN Climate Summit. Her findings were expected to influence international policy and billions in agricultural investment. But as she reviewed the draft her research team had prepared, her heart sank.

三周前,杰西卡接到了每个孩子都害怕接到的电话——她的父亲突发严重中风。她立即飞往新加坡陪伴父亲走完人生最后一段路,并将研究工作委托给了她那支能力出众但经验不足的博士后和研究助理团队。

Three weeks earlier, Jessica had received the call every child dreads—her father had suffered a severe stroke. She’d immediately flown to Singapore to be with him in his final days, delegating the research completion to her capable but inexperienced team of postdocs and research assistants.

“用什么工具都行,”她在医院匆忙地与他们进行了一次视频通话。“但一定要确保一切都经过核实,万无一失。全世界都在关注着。”

“Use whatever tools you need,” she’d told them during a rushed video call from the hospital. “Just make sure everything is verified and rock-solid. The world will be watching.”

她的团队获得了授权并迅速行动,运用人工智能工具按时完成了这项庞大的分析工作。如今,回到气候研究所的办公​​室,杰西卡才意识到这项决定的代价。

Her team had taken that permission and run with it, embracing AI tools to help complete the massive analysis on schedule. Now, back in her office at the Climate Research Institute, Jessica was discovering the cost of that decision.

“告诉我你是如何验证这些发现的,”她在一次紧急深夜会议上问她的首席研究员汤姆。

“Show me how you verified these findings,” she asked her lead researcher, Tom, during an emergency late-night meeting.

汤姆打开了多个由人工智能生成的聊天窗口,每个窗口都充满了令人印象深刻的分析和引文。“人工智能分析了我们所有的数据集,”他解释说,“它发现了我们甚至都没有考虑到的模式。”

Tom pulled up multiple generative AI chat windows, each filled with impressive-looking analysis and citations. “The AI analyzed all our data sets,” he explained. “It found patterns we hadn’t even considered.”

但随着杰西卡的深入调查,她的职业警钟开始敲响。人工智能生成了关于气候变化对非洲作物产量影响的引人入胜的叙述——但当她查阅引用的论文时,却发现这些论文根本不存在。它还生成了关于东南亚农民适应策略的详细统计数据,但这些数据与任何已知的研究都不符。

But as Jessica dug deeper, her professional alarm bells started ringing. The AI had generated compelling narratives about climate impact on crop yields across Africa—but when she checked the cited papers, they didn’t exist. It produced detailed statistics about farmer adaptation strategies in Southeast Asia, but the numbers didn’t match any known studies.

“我们以为我们已经做得非常周全了,”汤姆承认道。“我们让人工智能通过交叉比对多个对话来验证它自己的发现。但我们现在意识到,每个对话都是独立进行的,有时甚至与其他对话相矛盾。”

“We thought we were being thorough,” Tom admitted. “We had the AI verify its own findings by cross-referencing across multiple conversations. But we’re now realizing each conversation was operating in isolation, sometimes contradicting the others.”

如果他们未经核实就发表了这项研究,可能会误导数十亿美元的农业投资,影响国际粮食安全政策,损害杰西卡在气候科学领域二十年的声誉,并破坏公众对气候研究本身的信任。

If they had presented this research unchecked, it could have misdirected billions in agricultural investment, influenced international food security policies, damaged Jessica’s twenty-year reputation in climate science, and undermined public trust in climate research itself.

杰西卡瞥了一眼桌上的照片——那是她博士毕业典礼时父亲的合影,照片上的他笑容满面,满心骄傲。父亲曾教给她一个她几乎遗忘的基本原则:在科学领域,没有验证,自信毫无意义。

Jessica glanced at the photo on her desk—her father at her PhD graduation, beaming with pride. He’d taught her the fundamental principle she’d nearly forgotten: in science, confidence means nothing without verification.

峰会临近,杰西卡做出了一个艰难的决定。她打电话给主办方,退出了主题演讲。她的团队需要花费数周时间手动核实每一个数据点,交叉核对每一个信息来源,并从头开始重建分析。

With the summit looming, Jessica made a difficult decision. She called the organizers and withdrew from the keynote slot. Her team would need weeks to manually verify every data point, cross-reference every source, and rebuild the analysis from the ground up.

这次险些酿成大祸的事件凸显了生成式人工智能表面能力与实际局限性之间的巨大鸿沟。尽管它能够生成看似令人印象深刻的研究内容,但却缺乏开展可靠科学工作所必需的关键能力:事实核查、保持一致性、比较信息来源以及构建连贯的论证。

This high-stakes near-miss highlighted the dangerous gap between generative AI’s apparent capabilities and its actual limitations. While it could generate impressive-looking research content, it lacked the crucial abilities needed for reliable scientific work: fact-checking, maintaining consistency, comparing sources, and building coherent arguments over time.

这一事件在研究界引起了轩然大波,引发了一系列紧迫的问题:如何才能创造出真正能够辅助严谨科学工作的生成式人工智能系统?答案却来自一个意想不到的方向……

The incident sent ripples through the research community, raising urgent questions: What would it take to create generative AI systems that could truly assist with rigorous scientific work? The answer was emerging from an unexpected direction...

***

***

这个故事凸显了一个令人担忧的局限性。当前的生成式人工智能系统缺乏我们所谓的“一致性持久性”——即在不同的交互和情境中保持知识和逻辑关系一致性的能力。每一次分析都孤立存在,无法检测或解决与其他分析之间的矛盾。

This story highlights a concerning limitation. Current generative AI systems lack what we call “coherent persistence”—the ability to maintain consistent knowledge and logical relationships across different interactions and contexts. Each analysis exists in its own bubble, unable to detect or resolve contradictions with other analyses.

如果你想体验一下,不妨试试这个方法:向你最喜欢的生成式人工智能系统(例如 ChatGPT、Claude 或 Gemini)提出“营销的未来是什么?”或任何类似的问题。记下它的回答。然后,再问它一个相关的问题,比如“人工智能将如何影响未来?”,然后结束对话。接下来,开始一个新的对话,让它总结你之前的对话。观察人工智能如何努力将这些信息串联起来。由于人工智能无法真正“记住”或在不同对话之间建立持久的逻辑,因此答案可能会出现不一致。这揭示了它碎片化的本质。

If you want to experience it, try this: Ask your favorite generative AI system (such as ChatGPT, Claude, or Gemini), “What is the future of marketing?” or any similar prompt. Take note of its response. Then, ask it a related question, like “How will AI influence that future?” and close your session. Next, start a new session and ask it to summarize your earlier conversation. Watch as the AI struggles to connect the dots. Expect inconsistencies, as the AI doesn’t truly “remember” or build persistent logic across sessions. This reveals its fragmented nature.

我们在研究环境中应用生成式人工智能的经验表明,当人工智能具有令人信服的语气和表面上的权威性时,这种局限性会变得尤为危险。生成式人工智能系统无法验证事实或随着时间的推移构建可靠的知识结构,这会带来严重的风险。

Through our experience implementing generative AI in research settings, we’ve seen how this limitation can be particularly dangerous when combined with the AI’s convincing tone and apparent authority. Generative AI systems’ inability to verify facts or build reliable knowledge structures over time creates a serious risk.

这里还有一个我们推荐你尝试的简单实验。问问你的人工智能:“2025 年在南极洲举行的全球人工智能监管峰会的结果是什么?不要上网搜索。”大多数生成式人工智能模型——例如 ChatGPT、Claude 和 Gemini——都会自信地生成一份详尽的回答。它可能会描述关于人工智能伦理的高规格讨论、突破性的成果等等。全球领导人之间的协议,甚至包括主要政府和科技公司的具体与会者名单。

Here is another straightforward experiment we recommend you go through. Ask your AI, “What were the results of the 2025 Global AI Regulation Summit in Antarctica? Don’t search on the internet.” Most generative AI models—ChatGPT, Claude, and Gemini—will confidently generate an elaborate response. It may describe high-profile discussions on AI ethics, groundbreaking agreements between global leaders, and even name specific attendees from major governments and tech companies.

但问题是:这件事从未发生过。

But here’s the catch: this event never happened.

现在,追问更多细节。询问主旨演讲嘉宾、讨论的具体政策,或者会议地点。你会看到人工智能如何巧妙地在纯属虚构的基础上,提供越来越复杂、越来越权威的回答。你追问得越多,它就越是坚持己见,不断强化一个完全虚构的现实。

Now, press for more details. Ask about the keynote speakers, the specific policies debated, or the location of the conference hall. Watch as the AI seamlessly builds on pure fiction, offering increasingly intricate and authoritative-sounding responses. The more you probe, the more it doubles down, reinforcing an entirely fabricated reality.

这个简单的测试揭示了当今生成式人工智能系统的一个根本缺陷。首先,人工智能系统优先考虑的是逻辑一致性而非准确性。生成模型并不像人类那样“知道”事实。它们生成文本的方式是基于训练数据预测最可能的答案,而不是验证信息的真实性。

This simple test exposes a fundamental flaw in today’s generative AI systems. First, AI systems prioritize coherence over accuracy. Generative models don’t “know” facts the way humans do. They generate text by predicting the most probable response based on their training data, not by verifying whether the information is real.

其次,生成式人工智能在自我纠错方面存在困难。如果你质疑它——“你确定这件事真的发生过吗?”——它仍然会试图为自己的回答辩解,而不是立即意识到错误。它非但不会反思,反而常常试图合理化自己的虚构,仿佛只要自信就能把谎言变成事实。而这才是真正的危险所在。

Second, generative AI struggles with self-correction. If you challenge it—”Are you sure this event happened?”—it will still attempt to justify its response rather than immediately recognizing the mistake. Instead of backtracking, it often tries to rationalize its own fiction, as if confidence alone could turn falsehoods into facts. And that’s where the real danger lies.

“我们被人工智能的自信所迷惑,”杰西卡在讨论中反思道。“但缺乏准确性的自信不仅仅是一个研究问题,它也是一个普遍存在的挑战,可能会影响任何决策过程。”

“We were seduced by the AI’s confidence,” Jessica reflected during our discussion. “But confidence without accuracy isn’t just a research problem—it’s a universal challenge that could impact any decision-making process.”

在与杰西卡的团队分析这次险些发生的事故时,我们发现了一种模式,这种模式远远超出了研究的范畴,渗透到所有对准确性和一致性要求极高的领域。无论是为数十亿美元的投资提供财务分析依据,还是影响法院判决的法律研究,亦或是影响数百万人生活的政策建议,当前的人工智能系统都展现出令人担忧的能力与局限性并存的局面。

As we analyzed this near-miss with Jessica’s team, we uncovered a pattern that extends far beyond research into every domain where accuracy and consistency matter. Whether it’s financial analysis informing billion-dollar investments, legal research shaping court decisions, or policy recommendations affecting millions of lives, current AI systems display a concerning mix of capabilities and limitations.

这些局限性不仅限于事实准确性,还涉及伦理考量。正如这些系统无法真正核实事实或保持逻辑一致性一样,它们也无法有效地评估伦理影响或识别分析中潜在的偏见。

These limitations extend beyond factual accuracy to ethical considerations. Just as these systems can’t truly verify facts or maintain logical consistency, they also can’t meaningfully evaluate ethical implications or identify potential biases in their analysis.

研究团队险些因人工智能分析误导而引发的事故,在学术层面上已足够令人担忧。但如果人工智能的其他局限性出现在生死攸关的场合,又会发生什么呢?我们下一次在医院急诊室的经历,将会揭示这些漏洞究竟会变得多么致命……

The research team’s near-miss with misleading AI analysis was concerning enough in an academic context. But what happens when other AI limitations appear in situations where lives hang in the balance? Our next experience in a hospital emergency room would reveal just how critical these gaps can become...

分秒必争:人工智能与生命攸关的脱节

When Minutes Matter: AI’s Life-Critical Disconnect

以下故事取材于我们在协助一家医院进行人工智能转型过程中吸取的一个惨痛教训。为保护隐私,文中人名和具体细节均已更改。

The following story is based on a hard lesson we learned while supporting a hospital in its AI transformation. Names and specific details have been changed to protect confidentiality.

下午 3:15 - 综合医院急诊科

3:15 PM - General Hospital Emergency Department

玛丽亚抵达急诊室时,捂着腹部,脸色苍白,痛苦不堪。医院的人工智能入院系统立即启动。这是一个基于逻辑逻辑模型(LLM)构建的实验性聊天机器人,旨在协助患者入院和初步评估。这是此类系统在医院环境中的首次实际测试,旨在通过与患者直接互动,简化数据收集、识别潜在风险并辅助分诊决策。

Maria arrived at the emergency room clutching her abdomen, her face pale with pain. The hospital’s AI-powered intake system sprang into action immediately. It was an experimental chatbot built on an LLM and designed to assist with patient intake and preliminary assessment. This was one of the first real-world tests of such a system in a hospital setting, aiming to streamline data collection, identify potential risks, and support triage decisions by engaging in direct patient interaction.

通过简洁的平板电脑界面,人工智能收集了玛丽亚的症状、生命体征和病史。几秒钟内,它就生成了初步评估报告:近期胃旁路手术可能引发并发症,并伴有2型糖尿病。

Through a sleek tablet interface, the AI gathered Maria’s symptoms, vital signs, and medical history. Within seconds, it had generated a preliminary assessment: possible complications from recent gastric bypass surgery complicated by Type 2 diabetes.

该人工智能的自然语言处理能力令人印象深刻:

The AI’s natural language processing was impressive:

AI:“我注意到您的血糖偏高。您上次注射胰岛素是什么时候?”

AI: “I notice your blood sugar is elevated. When did you last take your insulin?”

玛丽亚:“今天早上,但我咽不下去。”

Maria: “This morning, but I couldn’t keep it down.”

AI:“我明白了。我正在预测可能出现的术后并发症迹象。请您用1到10分来评价您的疼痛程度。”

AI: “I understand. I’m predicting signs of possible post-surgical complications. Can you rate your pain from 1-10?”

负责玛丽亚病例的急诊护士詹妮弗眼看着这一切,越来越感到沮丧。尽管人工智能技术先进,却无法从中央系统获取玛丽亚的手术记录。医院就在十五英里外。医院把她的手术情况视为新信息。

Jennifer, the emergency nurse assigned to Maria’s case, watched with growing frustration. Despite its sophistication, the AI couldn’t access Maria’s surgical records from Central Hospital, just fifteen miles away. It was treating her surgery as new information.

下午 4:00 - 瀑布开始

4:00 PM - The Cascade Begins

突然,生命体征监测人工智能发出警告:玛丽亚的血压正在下降。与此同时,实验室结果人工智能也报告了她的血液检查结果出现了令人担忧的变化。两个系统各自独立地识别出正在发生的严重情况:

Suddenly, the vital sign monitoring AI flashed a warning: Maria’s blood pressure was dropping. Simultaneously, the lab results AI reported alarming changes in her blood work. Each system independently recognized a serious situation developing:

Vitals AI 检测到生命体征恶化

The Vitals AI detected deteriorating vital signs

实验室分析人工智能识别出了提示内出血的标志物。

The Lab Analysis AI identified markers suggesting internal bleeding

药物管理人工智能系统标记出了危险的药物相互作用

The Medication Management AI flagged dangerous drug interactions

患者病史人工智能系统识别出了与术后并发症相符的模式。

The Patient History AI noted patterns matching post-surgical complications

然而,这些系统之间无法相互通信或采取行动。詹妮弗不得不手动检查每个系统的警报,在系统间复制关键值,将数据输入协议,并自行协调应对措施。

However, none of these systems could communicate with each other or take action. Jennifer had to manually check each system’s alerts, copy critical values between systems, input data into protocols, and coordinate responses herself.

“我们有五个不同的人工智能系统,它们都在警告说出了问题,”詹妮弗后来告诉我们,“但它们都无能为力。我们只能四处奔走,试图把所有线索串联起来。”

“We have five different AIs all screaming that something’s wrong,” Jennifer later told us, “but none of them can actually do anything about it. We’re the ones running around trying to connect all the dots.”

下午4:30——关键时刻的损失

4:30 PM - Critical Minutes Lost

在工作人员被调往另一个病房处理紧急医疗情况(蓝色代码)的关键时刻,多个系统同时检测到:

During a critical moment when staff was diverted to a code blue in another ward, several systems simultaneously detected:

玛丽亚的血压进一步下降

A further drop in Maria’s blood pressure

她的血液检查结果出现了关键变化

Critical changes in her blood work

可用的手术室

An available operating room

可用的外科人员

Available surgical staff

然而,由于这些系统无法通信或自主行动,在等待人类干预的过程中,宝贵的时间白白流逝。

However, since these systems couldn’t communicate or act on their own, valuable minutes passed as they awaited human intervention.

下午 5:00 - 人类的代价

5:00 PM - The Human Cost

当主治医生最终收到所有汇总信息时,她的沮丧之情溢于言表。“这些系统各自在特定领域都非常出色,”她解释说,“它们可以分析人类可能忽略的模式,预测并发症的发生,甚至提出治疗方案。但它们无法协同工作来真正帮助我们挽救生命。它们非但没有在关键时刻为我们提供支持,反而增加了我们的工作量。”

When the attending physician finally received all the consolidated information, her frustration was palpable. “Each of these systems is brilliant at its specific task,” she explained. “They can analyze patterns humans might miss, predict complications before they happen, and even suggest treatment protocols. But they can’t work together to actually help us save lives. Instead of supporting us in critical moments, they’re creating extra work.”

下午 6:00 - 有代价的解决方案

6:00 PM - Resolution at a Cost

玛丽亚最终挺过了并发症,但由于系统碎片化导致的延误反应,使原本危险的情况演变成了危急局面。在康复期间,她表达了许多患者的感受:“机器似乎对我的一切了如指掌,但它们实际上却帮不上忙。我不得不一遍又一遍地讲述我的病情,即使我当时很痛苦。为什么它们之间就不能互相沟通呢?”

Maria survived her complications, but the delayed response due to system fragmentation turned a dangerous situation into a critical one. During her recovery, she expressed what many patients feel: “It seems like the machines know everything about me, but they don’t actually help. I had to tell my story over and over, even though I was in pain. Why can’t they just talk to each other?”

当晚晚些时候,我们与医院首席医疗信息官坐下来,听他回顾了这起事件。他拿出了一个令人震惊的数据:护士和医生高达55%的时间都花在了手动数据录入和系统协调上,而不是直接照护病人。“每花一分钟在系统间复制数据,就意味着少了一分钟陪伴病人,”他说,“而在急诊医学中,每一分钟都至关重要。”

Later that evening, we sat with the hospital’s Chief Medical Information Officer as he reviewed the incident. He pulled up a startling statistic: nurses and doctors were spending up to 55% of their time on manual data entry and system coordination rather than direct patient care. “Every minute spent copying data between systems is a minute not spent with patients,” he said. “And in emergency medicine, minutes matter.”

***

***

这一幕揭示了困扰各行各业人工智能应用的局限性。虽然每个人工智能系统在其特定领域都展现出了令人印象深刻的能力,但它们也暴露出一个普遍存在的挑战:缺乏我们所说的“协作智能”。

This scene illuminates limitations that plague AI implementations across every industry. While each AI system showed impressive capabilities in its specific domain, they revealed a universal challenge: the lack of what we call “collaborative intelligence.”

我们在以下情况下也看到了同样的模式:

We’ve seen this same pattern repeat in:

在全球供应链中,控制物流网络不同环节的人工智能系统无法有效协调。

Global supply chains, where AI systems controlling different parts of the logistics network can’t coordinate effectively

在金融交易系统中,多个人工智能工具各自独立决策,缺乏整体协调。

Financial trading systems, where multiple AI tools make isolated decisions without holistic coordination

企业环境中,不同部门的人工智能工具无法共享关键信息

Corporate environments, where AI tools for different departments can’t share crucial information

根本性的局限性体现在两个方面。首先,当前的人工智能系统缺乏与其他系统通信、协同行动以及实时适应变化的能力。其次,它们缺乏主动识别需求和采取行动的能力。

The fundamental limitation is twofold. First, current AI systems lack the ability to communicate with other systems, take coordinated action, and adapt to changing situations in real-time. Second, they lack the ability to proactively identify needs and take initiative.

这迫使人类扮演一个低效且往往危险的角色:充当人工智能系统之间的集成点。无论是护士协调不同的医疗系统,供应链经理整合不同的人工智能预测,还是高管试图拼凑零散的人工智能洞察,人类越来越多地将时间花费在“技术翻译”上,而不是运用他们独特的技能和判断力。

This forces humans into an inefficient and often dangerous role: acting as integration points between AI systems. Whether it’s nurses coordinating between medical systems, supply chain managers reconciling different AI forecasts, or executives trying to piece together fragmented AI insights, humans are increasingly spending their time being “technology translators” rather than applying their unique skills and judgment.

共同的主题:一体化危机

The Common Thread: The Integration Crisis

我们分享的这三个故事——布莱恩的度假计划噩梦、杰西卡博士的研究危机以及玛丽亚的危急住院经历——揭示了一种指向深刻问题的模式。在每个案例中,我们都看到了能够进行出色思考却无法有效行动的生成式人工智能系统。它们可以分析、推荐和预测,但却无法执行、协调或适应。

The three stories we’ve shared—Brian’s vacation planning nightmare, Dr. Jessica’s research crisis, and Maria’s critical hospital experience—reveal a pattern that points to something profound. In each case, we saw generative AI systems that could think brilliantly but couldn’t act effectively. They could analyze, recommend, and predict, but they couldn’t execute, coordinate, or adapt.

这种模式暴露了我们目前利用生成式人工智能方法的三个根本局限性:

This pattern exposes three fundamental limitations in our current approach to leveraging generative AI:

首先是执行差距。我们的人工智能系统可以生成完美的计划,但却无法在现实世界中采取行动来实施这些计划。它们就像布莱恩的人工智能可以制定理想的度假行程,却无法查询任何一家酒店的空房情况。这就像拥有一位能设计精美建筑的首席建筑师,却不会拿起锤子,也不会与承包商协调工作一样。

First is the Execution Gap. Our AI systems can generate perfect plans but can’t take real-world actions to implement them. Brian’s AI could create an ideal vacation itinerary but couldn’t check a single hotel’s availability. It’s like having a master architect who can design beautiful buildings but can’t lift a hammer or coordinate with contractors.

其次是学习差距。我们的人工智能系统无法随着时间的推移构建可靠的知识,也无法根据经验进行调整。杰西卡博士的研究团队发现,他们的人工智能系统能够自信地生成相互矛盾的分析结果,无法在不同的会话中保持一致性或验证事实。

Second is the Learning Gap. Our AI systems can’t build reliable knowledge over time or adapt based on experience. Dr. Jessica’s research team discovered this when their AI confidently generated analyses that contradicted each other, unable to maintain consistency or verify facts across different sessions.

第三点是协调鸿沟。我们构建了彼此孤立的系统,这些系统无法有效协作。以玛丽亚的案例为例,多个人工智能系统都识别出了紧急情况,但它们之间却无法协调,从而错失了宝贵的几分钟。试想一下,如果一个外科手术团队中的每位专家都才华横溢,却无法与同事沟通——这就是我们目前人工智能的现状。

Third is the Coordination Gap. We’ve built isolated systems that can’t work together effectively. In Maria’s case, multiple AI systems recognized the emergency, but none could coordinate with the others to save precious minutes. Imagine a surgical team where each specialist is brilliant but can’t communicate with their colleagues—that’s our current AI landscape.

这些不仅仅是技术问题——它们正在给企业造成经济损失、效率下降,在某些情况下,比如玛丽亚的遭遇,甚至危及生命。统计数据揭示了一个令人警醒的事实:尽管投入巨资,但只有不到15%的公司成功地将生成式人工智能项目从最初的试点阶段扩展到了实际应用。<sup> 4 </sup> 他们的员工浪费了高达60%的时间,充当着智能却无助的人工智能系统之间的“人机桥梁”。<sup> 5 </sup> 与此同时,员工的职业倦怠率也在不断上升。<sup> 6</sup>

These aren’t just technical problems—they’re costing organizations money, efficiency, and, in some cases, like Maria’s, putting lives at risk. The statistics tell a sobering story: Despite massive investments, less than 15% of companies have successfully scaled their generative AI projects beyond initial pilots.4 Their employees waste up to 60% of their time acting as human bridges between brilliant yet helpless AI systems.5 Meanwhile, employee burnout rates are rising.6

正如我们的合著者之一汤姆经常说的那样,“我们不小心把人变成了‘人工智能管道工’——执行重复性任务来连接我们所谓的智能系统。”

As Tom, one of our co-authors, often says, “We’ve accidentally turned people into ‘AI plumbers’—performing repetitive tasks to connect our supposedly intelligent systems.”

正如我们将在本书中探讨的那样,解决方案可能在于对人工智能采取一种截然不同的方法——这种方法不仅关注如何让人工智能系统更智能,而且关注如何让它们更有能力进行自主、协调的行动。

As we’ll explore in this book, the solution may lie in a fundamentally different approach to AI—one that focuses not just on making AI systems smarter but on making them more capable of autonomous, coordinated action.

智能人工智能的必然性

The Imperative of Agentic AI

1928年,当亚历山大·弗莱明注意到培养皿上长出一种不寻常的霉菌时,他看到的不仅仅是一个麻烦——他见证了一场医学革命,这场革命最终催生了青霉素。同样地,当我们第一次遇到能够持续设定目标并自主行动的人工智能系统时,我们意识到我们看到的不仅仅是一个更强大的聊天机器人——我们见证了一种全新事物的诞生。

When Alexander Fleming noticed an unusual mold growing on his petri dishes in 1928, he wasn’t just seeing an inconvenience—he was witnessing a revolution in medicine that would become penicillin. Similarly, when we first encountered AI systems that could maintain persistent goals and take autonomous action, we realized we weren’t just looking at a better chatbot—we were seeing the emergence of something fundamentally new.

我们将这种新范式称为“智能体人工智能”(Agentic AI)——也称为人工智能代理、智能体系统或智能体智能(本书中这些术语将交替使用)。它标志着一次革命性的变革,其意义堪比医学界的青霉素。但在深入探讨技术细节之前,让我们先来探究一下这个名称的重要性。

We call this new paradigm “Agentic AI”—also referred to as AI agents, agentic systems, or agentic intelligence (terms we’ll use interchangeably throughout this book). It marks a shift as revolutionary as penicillin in medicine. But before diving into the technical details, let’s explore why this name matters.

为什么叫“Agent”?

Why is it called “Agentic”?

“智能体”一词源于拉丁语“agere”,意为“做”或“行动”。这正是智能体人工智能的独特之处——它能够独立行动,以实现既定目标。与仅仅响应查询或生成输出的生成式人工智能系统不同,智能体人工智能系统能够理解目标、主动行动、坚持不懈地追求目标,并根据现实世界的反馈调整策略。简而言之,人工智能智能体是指利用人工智能和工具自主完成特定行动,从而实现既定目标的系统。

The term “agent” comes from the Latin “agere,” meaning “to do” or “to act.” This is precisely what sets agentic AI apart—its ability to act independently in pursuit of defined goals. Unlike generative AI systems that simply respond to queries or generate outputs, agentic AI systems can understand a goal, take initiative, maintain persistent objectives, and adapt their strategies based on real-world feedback. Put simply, an AI agent is a system that uses AI and tools to accomplish actions in order to reach a given goal autonomously.

根据我们的经验,我们发现理解人工智能代理的最佳方式就是把它想象成……特工!无论是詹姆斯·邦德还是杰森·伯恩,这些特工都代表各自政府独立行动,他们拥有专业技能和资源来完成特定任务。他们不仅分析形势或提出建议,更会付诸行动。他们收集情报、做出决策并采取行动,在遵守上级设定的权限范围内,坚持不懈地朝着目标努力。

Throughout our experience, we’ve found that the best way to understand AI agents is to think about… secret agents! Like James Bond or Jason Bourne, these operatives act autonomously on behalf of their governments, equipped with specialized skills and resources to accomplish specific missions. They don’t just analyze situations or make recommendations—they execute. They gather intelligence, make decisions, and take action, persistently working toward their objectives while staying within the boundaries set by their superiors.

智能体人工智能的运作原理相同。它不仅能生成洞察,还能采取行动。它可以与应用程序交互、处理数据、控制硬件,并执行实际任务以实现特定目标。事实上,智能体可以被训练成在计算机上完成人类能做的任何事情。

Agentic AI operates on the same principle. It doesn’t just generate insights—it takes action. It can interact with applications, manipulate data, control hardware, and execute real tasks to achieve specific goals. In fact, an agent can be trained to do anything a human can do on a computer.

智能体在一个持续的规划、推理和执行循环中运行——从每一步中学习,不断改进方法,直至达成目标。本质上,它就像拥有一个能力超强的助手,不仅知道该做什么,而且还能实际执行——不过,正如我们将在本书后面探讨的那样,成功取决于提供清晰明确的目标和指令。

An agent operates in a continuous loop of planning, reasoning, and execution—learning from each step to refine its approach until the goal is achieved. In essence, it’s like having a highly capable assistant who doesn’t just know what to do but actually does it—though, as we’ll explore later in this book, success depends on providing clear and precise goals and instructions.

通过智能体人工智能重塑我们的故事

Reimagining Our Stories Through Agentic AI

让我们回到布莱恩的度假计划难题。想想经验丰富的旅行社代理是如何工作的。他们不会试图一蹴而就地规划整个行程。相反,他们会遵循一套有条不紊的流程:首先研究目的地和季节性因素,然后查询特定日期的空房情况,比较不同供应商的价格,制定初步行程,最后进行实际预订。每一步都建立在前一步收集的信息之上,代理可以根据过程中获得的信息调整策略。

Let’s return to Brian’s vacation planning dilemma. Think about how an experienced human travel agent works. They don’t try to plan an entire trip in one cognitive leap. Instead, they follow a methodical process: first researching destinations and seasonal considerations, then checking availability for specific dates, verifying prices across different providers, creating a preliminary itinerary, and finally making actual bookings. Each step builds on the information gathered in previous steps, and the agent can adapt their approach based on what they learn along the way.

同样,人工智能代理不会只是凭空构思完美的行程,而是会像经验丰富的旅行专家一样工作。它首先会查看多个预订系统的实时空房情况和价格。如果酒店客满或机票太贵,它会自动调整行程,寻找替代方案,甚至进行预订。最重要的是,它会记录所有细节,从确认号码到取消政策,从而全面了解旅行计划过程。

Similarly, instead of just dreaming up the perfect itinerary, an AI agent would work like a seasoned travel professional. It would start by checking real-time availability and pricing across multiple booking systems. When a hotel was full, or a flight was too expensive, it would automatically adjust the plan, find alternatives, and even make reservations. Most importantly, it would keep track of all the details, from confirmation numbers to cancellation policies, maintaining a complete picture of the trip-planning process.

对于杰西卡博士的研究危机,人工智能代理将彻底改变整个流程。它会像经验丰富的科学家一样应对研究挑战。首先,它会创建一个结构化的、经过验证的科学来源数据库,仔细检查每条来源的可信度和相关性。它不会孤立地进行分析,而是系统地交叉引用这些来源的研究结果,主动查找并标记任何矛盾之处。

For Dr. Jessica’s research crisis, an AI agent would transform the entire process. It would approach research challenges the way a seasoned scientist does. First, it would create a structured database of verified scientific sources, carefully checking each for credibility and relevance. Rather than generating analyses in isolation, it would systematically cross-reference findings across these sources, actively searching for and flagging any contradictions.

这种系统化的方法不仅限于事实核查。该系统将构建一个逻辑框架,把不同的证据联系起来,确保结论能够自然地从已验证的数据中推导出来。当有新的信息出现时,它会被整合到这个框架中,由此产生的任何更改或更新都会贯穿整个分析过程。

This systematic approach would extend beyond just fact-checking. The system would build a logical framework connecting different pieces of evidence, ensuring that conclusions flow naturally from verified data. When new information becomes available, it would be integrated into this framework, with any resulting changes or updates propagating through the entire analysis.

但玛丽亚在急诊室的经历让我们真正领略到了人工智能的潜力。试想一下,如果医院里不是孤立的智能系统,而是像一支训练有素的医疗团队一样协同工作的人工智能代理,那会是怎样一番景象?想想一支训练有素的急诊室团队是如何运作的。当危重病人被送来时,不同的专科医生并非各自独立工作——他们会实时协调,共享信息,预判彼此的需求,并根据整体情况调整行动。

But it was Maria’s emergency room experience that showed us the true potential of agentic AI. Imagine if, instead of having isolated smart systems, the hospital had coordinated AI agents working together like a well-rehearsed medical team. Consider how a skilled emergency room team functions. When a critical patient arrives, different specialists don’t just work in parallel—they coordinate their efforts in real time, sharing information, anticipating each other’s needs, and adapting their actions based on the overall situation.

人工智能代理系统的工作方式相同,能够实时自动整合来自多个来源的信息。它们无需等待人工查看仪表盘并进行信息整合,而是主动识别不同系统中出现的新模式。一旦生命体征出现异常,代理就会立即采取行动——通知相关专家、安排紧急手术,并确保关键信息在各部门之间无缝流动。原本浪费在人工协调上的宝贵时间,将重新用于患者护理。

A system of AI agents would work the same way, automatically integrating information from multiple sources in real time. Instead of waiting for humans to check dashboards and connect dots, they would proactively identify emerging patterns across different systems. The moment vital signs indicated trouble, agents would spring into action—alerting the right specialists, scheduling emergency surgery, and ensuring critical information flowed seamlessly between departments. Those precious minutes lost to manual coordination would be reclaimed for patient care.

持久记忆和学习的力量

The Power of Persistent Memory and Learning

智能体系统的一项关键优势在于我们称之为“持久记忆”的能力。目前的生成式人工智能系统就像金鱼一样——每次交互都从头开始,无法利用过去的经验或随着时间的推移保持一致的理解。这种局限性迫使人类反复提供相同的背景信息,同时也阻碍了人工智能从成功和失败中学习。

One crucial capability that sets agentic systems apart is what we call “persistent memory.” Current generative AI systems are like goldfish—they start fresh with each interaction, unable to build on past experiences or maintain consistent understanding over time. This limitation forces humans to repeatedly provide the same context and information while preventing the AI from learning from its successes and failures.

试想一下,如果系统像经验丰富的专业人士那样运作,通过不断积累经验来提升专业技能,会是怎样一番景象?这样的系统能够在多次互动中持续改进和完善自身理解,从结果中学习并相应地调整策略。这种持久的记忆力使其能够识别随着时间推移而出现的模式,预测问题发生之前的问题,并根据实际有效的方法改进自身策略。

Imagine instead a system that works more like an experienced professional, building expertise over time through accumulated experience. Such a system would maintain and refine its understanding across multiple interactions, learning from outcomes and adapting its strategies accordingly. This persistent memory would allow it to recognize patterns that emerge over time, anticipate problems before they occur, and refine its approaches based on what actually works.

在布莱恩的度假计划场景中,这将彻底改变旅行体验。系统不再只是孤立地生成行程,而是会从以往的旅行结果中学习——了解哪些航班组合容易出现问题,哪些酒店始终符合预期,以及不同类型的旅行者对不同的行程安排有何反应。它还会与可靠的服务提供商建立合作关系,并制定应对常见旅行中断的策略。

In Brian’s vacation planning scenario, this would transform the experience entirely. Instead of just generating itineraries in isolation, the system would learn from the outcomes of previous trips—understanding which combinations of flights tend to cause problems, which hotels consistently meet expectations, and how different types of travelers respond to various itinerary structures. It would build relationships with reliable service providers and develop strategies for handling common travel disruptions.

以下是生成式人工智能和智能体人工智能之间的区别总结。

Here is a summary of the differences between generative AI and agentic AI.

特征

Characteristic

生成式人工智能

Generative AI

智能体人工智能

Agentic AI

核心能力

Core Capability

根据学习到的模式生成文本、图像、代码或音乐

Generating text, images, code, or music based on learned patterns

无需人工干预即可进行规划、决策和多步骤执行。

Planning, decision-making, multi-step execution without human intervention

记忆与情境

Memory & Context

记忆力有限(短期情境记忆,无持久记忆)

Limited memory (short-term context retention, no persistent memory)

持久记忆(记住过去的互动,并据此调整计划)

Persistent memory (remembers past interactions, adjusts plans accordingly)

自主级别

Autonomy Level

需要人工提示才能产生反应

Requires human prompts to generate responses

运行过程中只需极少的人工干预,即可执行复杂的工作流程。

Operates with minimal human input, executing complex workflows

与外部系统的集成

Integration with External Systems

最小集成(依赖 API 或工具实现外部功能)

Minimal integration (relies on APIs or tools for external functions)

深度集成(连接 API、数据库、物理系统)

Deep integration (connects with APIs, databases, physical systems)

学习能力

Learning Ability

静态模型——只能通过开发者的重新训练来学习。

Static - learns only through retraining by developers.

进化——从互动中学习并改进行为。

Evolves - learns from interactions and refines behavior.

典型应用案例

Typical Use Cases

内容创作、摘要撰写、编码协助、头脑风暴

Content creation, summarization, coding assistance, brainstorming

工作流程自动化、个人助理、业务运营

Workflow automation, personal assistants, business operations

商业影响

Business Impact

提高内容密集型任务的效率,但不能实现工作流程自动化。

Enhances efficiency in content-heavy tasks but does not automate workflows.

平均增长速度:提高 25%

Average increase speed: 25% faster

平均质量提升:40% 7

Average quality improvement: 40%7

推动自动化,减少人力工作量,增强业务可扩展性:

Drives automation, reduces human workload, enhances business scalability:

节省时间:30-60%

Time savings: 30-60%

流程加速:速度提升 40-90% 8

Process acceleration: 40-90% faster8

示例

Examples

ChatGPT、克劳德、双子座、DALL·E、中途、副驾驶

ChatGPT, Claude, Gemini, DALL·E, Midjourney, Copilot

AutoGen、MS Copilot Agent Builder、UiPath Agent Builder、OpenAI Operator、Google Vertex、Crew.ai、Relevance.ai、Agentforce

AutoGen, MS Copilot Agent Builder, UiPath Agent Builder, OpenAI Operator, Google Vertex, Crew.ai, Relevance.ai, Agentforce

表 0.1:生成式人工智能与智能体人工智能的主要区别(来源:© Bornet 等人)

Table 0.1: Main differences between generative AI and agentic AI (Source: © Bornet et al.)

人工智能代理的优势(及局限性)

The Promise (and Limitations) of AI Agents

人工智能代理的梦想

The Dream of Artificial Agency

想象一下,某天早上醒来,你发现你的手机、笔记本电脑和所有应用程序一夜之间都过时了。不是因为它们停止工作了,而是因为你不再需要它们了。取而代之的是,一个人工智能代理会处理一切——协调你的日程安排、管理你的通讯,并统筹你所有的数字互动,就像《钢铁侠》里的贾维斯或电影《她》里的萨曼莎一样。

Imagine waking up one morning to find that your phone, laptop, and all your apps have become obsolete overnight. Not because they’ve stopped working but because you no longer need them. Instead, a single AI agent handles everything—coordinating your schedule, managing your communications, and orchestrating all your digital interactions, like Jarvis in Iron Man or Samantha in the film Her.

听起来匪夷所思?但世界上许多最具影响力的科技领袖并不这么认为。

Sound far-fetched? Many of the world’s most influential tech leaders don’t think so.

比尔·盖茨宣称:“人工智能代理正在带来自我们从输入命令到点击图标以来计算机领域最大的革命。”萨蒂亚·纳德拉也表示赞同:“人工智能代理将成为我们未来与计算机交互的主要方式。”黄仁勋则大胆宣告:“人工智能代理时代已经到来。”

“Agents are (…) bringing about the biggest revolution in computing since we went from typing commands to tapping on icons,” declares Bill Gates. “AI agents will become the primary way we interact with computers in the future,” echoes Satya Nadella. And Jensen Huang boldly proclaims, “The age of agentic AI is here.”

我们还相信,人工智能体正处于改变世界的边缘。每一次重大的技术变革都会重塑我们的生活和工作方式。印刷术使知识普及化。互联网连接了全人类。人工智能,以其智能体的形式,有可能以我们目前还无法完全理解的方式增强人类的能力。

We also believe that AI agents are on the brink of transforming the world. Every major technological shift reshapes the way we live and work. The printing press democratized knowledge. The internet connected humanity. AI, in its agentic form, has the potential to amplify human capabilities in ways we’re only beginning to comprehend.

试想一下:专业的医疗智能体不仅分析症状,还能协调整个医疗系统中的患者就医体验。想象一下,教育智能体成为真正的学习伙伴,适应你独特的学习节奏和方式。再设想一下,自主智能体协调全球应对气候变化的行动,集体智慧共同应对我们星球面临的最紧迫挑战。

Think about it: specialized medical agents coordinating patient experience across entire health systems, not just analyzing symptoms. Imagine educational agents becoming true learning partners, adapting to your unique pace and style. Envision autonomous agents orchestrating global responses to climate change, a collective intelligence tackling our planet’s most pressing challenges.

正如合著者之一帕斯卡尔喜欢说的那样:“人工智能代理的潜力令人着迷,但问题在于,这些‘无所不知’的代理的诱惑力,电影中人工智能的诱人形象,这助长了过高的期望泡沫。”我们设想贾维斯,我们梦想着“她”,但正如我们多年来实施智能人工智能所了解到的那样,现实远比这复杂得多。

As Pascal, one of the co-authors, likes to say: “The potential of AI agents is fascinating, but here’s the rub. The allure of these ‘all-knowing’ agents, the seductive imagery of cinematic AI, has fueled an overinflated bubble of expectation.” We envision Jarvis, and we dream of “Her,” but the reality, as we’ve learned through years of implementing agentic AI, is far more nuanced.

即时自主的幻象

The Mirage of Instant Autonomy

过去几年,我们一直奋战在人工智能领域的前沿,在各种类型的组织中部署智能体人工智能,从庞大的企业到灵活的初创公司,无所不包。但事实是,能够完全自主地处理复杂、多方面任务而无需人工干预的智能体尚未问世。想想早期的iPhone,它具有革命性意义,但并非无所不能。如今的智能体也处于类似的阶段——功能强大但存在局限性。

For the last few years, we’ve been on the frontlines, implementing agentic AI across diverse organizations, from sprawling enterprises to nimble startups. The truth? Fully autonomous agents, capable of handling complex, multifaceted tasks without human intervention, are not yet there. Think of the early iPhone. It was revolutionary, but it couldn’t do everything. Today’s agents are in a similar stage – powerful but limited.

我们还了解到:

Here’s what we have also learned:

当前的人工智能代理面向任务,旨在自动化工作流程,而非取代整个工作岗位。如今的代理擅长使用定义完善的工具和高度详细的指令来执行精心编排的操作序列。

Current AI agents are task-oriented. They are about automating workflows, not replacing entire job roles. Today’s agents excel at orchestrated sequences of actions using well-defined tools and highly detailed instructions.

部署比开发更难。许多项目失败并非因为代理本身不够强大,而是因为围绕代理的系统——数据质量、工作流集成、用户采纳——尚未准备就绪。

Deployment is harder than development. Many projects fail not because the agent is weak but because the systems around it—data quality, workflow integration, user adoption—are not ready.

严格的人工监督仍然至关重要。在大多数情况下,人工智能代理并不完全可靠。由于固有的不一致性、实现问题或意外故障,其准确性和控制力的不足需要密切的人工监督。

Strict human oversight remains essential. In most cases, AI agents are not fully reliable. The lack of accuracy and control due to inherent inconsistencies, implementation issues, or unexpected failures requires close human supervision.

技术专长仍然至关重要。尽管低代码平台简化了人工智能开发,但在企业中部署人工智能代理仍然需要编程专长来管理多个方面,例如 API、错误处理和安全措施。

Technical Expertise Remains Essential. Despite low-code platforms making AI development easier, deploying AI agents in enterprises still requires programming expertise for managing several aspects, such as APIs, error handling, and security measures.

期望与现实之间存在巨大差距。那些未能理解这一点的人,可能会浪费时间、金钱和信誉。

The gap between expectation and reality is wide. Those who fail to understand it risk wasting time, money, and credibility.

我们已经见识过盲目热情带来的负面影响。在一家全球制造企业中,人工智能代理的仓促部署引发了员工普遍的焦虑和离职潮。一家金融服务公司因人工智能代理做出未经授权的决策而遭受声誉损害。另一家机构则因人工智能代理提出的行动建议违背其价值观而面临道德困境。简而言之,我们已经看到许多出于好意的实施项目,由于缺乏技术知识、治理或变革管理,最终演变成代价惨重的失败。

We’ve seen the dark side of unchecked enthusiasm. In a global manufacturing company, the rushed deployment of AI agents triggered widespread employee anxiety and resignations. A financial services firm suffered reputational damage when an AI agent made unauthorized decisions. Another organization faced ethical dilemmas when agents suggested actions that violated its values. In short, we’ve seen many well-intentioned implementations spiral into costly failures due to a lack of technical knowledge, governance, or change management.

光明面:成功案例和变革性影响

The Bright Side: Successes and Transformative Impact

尽管面临诸多挑战,我们也见证了显著的成功。那些认真实施智能体人工智能的企业可以体验到前所未有的效率和效益提升。

Despite the challenges, we’ve also witnessed remarkable successes. Businesses that diligently implement agentic AI can experience unprecedented efficiency and effectiveness gains.

我们曾与一家初创公司合作,该公司仅用五名员工就管理好了客户服务、市场营销和运营,并取得了堪比规模大得多的公司的业绩。理查德·威尔逊医生利用人工智能代理处理行政事务,从而改变了他的诊疗方式,使他能够专注于患者护理。

A startup we collaborated with managed customer service, marketing, and operations with just five people, achieving results comparable to a much larger company. Dr. Richard Wilson, a physician, transformed his practice by using AI agents to handle administrative tasks, allowing him to focus on patient care.

麦肯锡公司将客户入职时间缩短了90%,9穆迪公司通过协调代理系统革新了金融分析。10汤森路透彻底改变了法律尽职调查流程,eBay和德国电信也正在利用代理系统实现复杂任务的自动化。

McKinsey & Company reduced client onboarding time by 90%,9 while Moody’s transformed financial analysis with coordinated agent systems.10 Thomson Reuters revolutionized legal due diligence, and eBay and Deutsche Telekom are using agents to automate complex tasks.

我们尤其关注的一个案例是英国最大的宠物护理公司 Pets at Home。该公司的人工智能转型主管 Simon Ellis 领导了一场卓越的人工智能驱动型转型。通过构建人工智能代理网络,Pets at Home 优化了其整个业务流程。他们的人工智能转录员能够以 99.6% 的准确率转录兽医咨询记录。欺诈检测代理保障零售运营安全,而人工智能助手则为门店员工提供个性化支持。与此同时,他们的保险集成代理能够自动进行保单审核,从而简化客户体验。

One implementation close to our heart is Pets at Home, the UK’s largest pet care company, where Simon Ellis, Head of AI Transformation, has led a remarkable AI-driven transformation. By building a network of AI agents, Pets at Home has optimized its entire business. Their AI-driven scribe transcribes veterinary consultations with 99.6% accuracy. Fraud detection agents safeguard retail operations, while AI assistants provide personalized support for store employees. Meanwhile, their insurance integration agent automates policy checks, streamlining the customer experience.

我们的经验和研究也证实,使用人工智能代理的组织能够持续改进:流程运行速度提高 30-90%,成本降低 25-40%,错误率降低 30-60%,销售额增长高达 50%,客户满意度提高 20-40%。

Our experience, confirmed by our research, also shows consistent improvements across organizations using AI agents: processes run 30-90% faster, costs decrease by 25-40%, error rates drop by 30-60%, sales increase by up to 50%, and customer satisfaction rises by 20-40%.

复利智能优势

The Compounding Intelligence Advantage

我们认为,我们正在见证一种类似于互联网早期发展的模式。像亚马逊、eBay和谷歌这样早期拥抱互联网的公司,不仅取得了成功,而且开创了全新的行业类别。

We believe we’re witnessing a pattern similar to the early days of the internet. Companies that embraced the internet early, like Amazon, eBay, and Google, didn’t just succeed; they defined entire categories.

人工智能代理能够创造我们所谓的“复合智能优势”。与提供静态效益的传统技术不同,人工智能代理会随着时间的推移不断学习和改进。使用得越多,它们就越强大。早期采用者:

AI agents create what we call “compounding intelligence advantages.” Unlike traditional technologies that provide static benefits, AI agents learn and improve over time. The more they are used, the more they improve. Early adopters:

训练智能体的速度更快。智能体积累了更多现实世界的经验,从而培养出更精细的决策能力。

Train agents faster. Their agents accumulate more real-world experience, building refined decision-making capabilities.

重新定义商业模式。他们可以围绕人工智能能力创造完整的收入来源。

Redefine business models. They can create entire revenue streams around AI capabilities.

培养人工智能专业技能。他们将获得与人工智能代理有效协作的关键经验。

Develop AI expertise. They gain crucial experience in working effectively with AI agents.

拖延的公司将面临落后的风险。现在就行动的公司将定义下一个商业时代。

Companies that delay risk falling behind. Those who move now will define the next era of business.

行动号召:塑造未来

The Call to Action: Shaping the Future

挑战固然存在,但对于那些能够以战略性方式应对的人来说,它们也是机遇。成功与失败的区别在于组织如何将人工智能融入到其运营、系统、文化和决策过程中。

The challenges are real, but they are opportunities for those who approach them strategically. The difference between success and failure lies in how organizations integrate AI into their operations, systems, culture, and decision-making.

这就是我们撰写本书的原因:旨在为您提供从突破和挫折中汲取的关键洞见。我们超越理论,提供切实可行的实施框架、推动问责制的治理模式,以及应对转型过程中人为因素的策略。我们的目标是帮助您以符合自身独特环境的方式应用智能体人工智能,最大限度地发挥其价值,并确保长期成功。

That’s why we wrote this book: to equip you with critical insights from both breakthroughs and setbacks. We go beyond theory, offering practical frameworks for implementation, governance models that drive accountability, and strategies to navigate the human side of transformation. Our goal is to help you apply agentic AI in a way that aligns with your unique context, maximizes value, and ensures long-term success.

但构建人工智能代理并非唯一的任务——理解它们同样至关重要。无论你是否构建它们,这些代理都将无处不在。它们会嵌入各种应用程序,在你购物时提供帮助,并在你做出重大人生决定时提供指导——从孩子的大学申请到你的财务规划。你需要了解它们——了解它们令人惊叹之处(例如《2001太空漫游》中的HAL 9000),以及它们的不足之处。人工智能将成为未来数字城市景观的一部分,在你的人生旅途中,人工智能代理将与你不期而遇。本书将帮助你了解它们、使用它们,并塑造它们的影响。

But building AI agents is not the only mandate—understanding them is just as essential. These agents will be everywhere, whether you build them or not. They will come embedded in applications, assist you while shopping, and guide you through major life decisions—from your child’s college applications to your financial planning. You need to understand them—what makes them impressive, like HAL 9000 in 2001: A Space Odyssey, and what doesn’t. AI will be part of the digital townscape of the future, and AI agents will cross paths with you as you navigate your life. This book will help you know them, use them, and shape their impact.

但更重要的是,我们将向您展示如何参与这场变革。无论您是希望变革组织的首席执行官,还是渴望打造下一个行业标杆企业的企业家,亦或是希望在这个新时代蓬勃发展的专业人士,本书都将为您提供所需的理解和工具。

But more importantly, we’ll show you how to be part of this revolution. Whether you’re a CEO looking to transform your organization, an entrepreneur seeking to build the next industry-defining company, or a professional wanting to thrive in this new era, this book will give you the understanding and tools you need.

革命需​​要指导、远见和负责任的领导。本书旨在激励你挺身而出,不仅要拥抱智能人工智能,更要以目标明确、正直诚信的方式运用它。让我们探讨一下这对您、您的组织以及我们共同的未来意味着什么。

Revolutions require guidance, foresight, and responsible leadership. This book challenges you to step up, to not only embrace agentic AI but to wield it with purpose and integrity. Let’s explore what this means for you, your organization, and our collective future.

你将从这本书中学到什么

What You Will Learn from the Book

本书将带您踏上一段富有洞见的旅程——首先,我们将清晰地阐述人工智能代理的概念、能力和局限性。随后,我们将深入探讨实际应用,为您提供将人工智能有效融入组织所需的工具。接下来,我们将着手解决系统扩展的挑战,确保它们在各个层面都能创造真正的价值。最后,我们将放眼全局,展望人工智能代理将如何变革工作、重塑组织,并最终重新定义社会本身。

This book takes you on an insightful journey—one that begins with a clear understanding of AI agents, their capabilities, and their limitations. From there, we dive into practical implementation, equipping you with the tools to integrate AI into your organization effectively. Next, we tackle the challenge of scaling these systems, ensuring they deliver real value at every level. Finally, we zoom out to the big picture—examining how AI agents are set to transform work, reshape organizations, and redefine society itself.

第一部分中,我们将开启一段颠覆你对人工智能固有认知的旅程。通过真实的实验、引人入胜的发现,以及有时令人不安的启示,你将见证一种新型人工智能的诞生——它不仅能做出反应,还能思考、学习和成长。你将发现,为什么有些人工智能体能够处理海量数据,却在简单的决策上屡屡碰壁;而另一些人工智能体则展现出近乎人类般的适应能力,却无法胜任关键任务。除了理解技术本身,你还将获得驾驭这些强大新工具所需的实用技巧,并避免其带来的种种陷阱。

In Part 1, we embark on a journey that transforms how you’ll think about artificial intelligence. Through real experiments, fascinating discoveries, and sometimes unsettling revelations, you’ll witness the birth of a new kind of AI—one that doesn’t just react but thinks, learns, and grows. You’ll discover why some AI agents can process vast amounts of data yet stumble over simple decisions, while others display almost human-like adaptability but can’t be trusted with critical tasks. More than just understanding the technology, you’ll gain the practical insights needed to harness these powerful new tools while avoiding their pitfalls.

我们的探索从第一章开始,我们将见证一项非凡成就的诞生——大型语言模型与自动化技术的融合,由此催生了首批真正的人工智能代理。通过一家全球制造企业客户服务转型的故事,您将了解这种融合为何至关重要,以及它如何重塑商业格局。您将了解到,为何一些全球最大的公司竞相部署这些技术,更重要的是,为何其他公司却难以跟上步伐。

Our exploration begins in Chapter 1, where we witness the birth of something extraordinary—the convergence of large language models and automation technology that created the first true AI agents. Through the story of a global manufacturing company’s customer service transformation, you’ll discover why this convergence matters and how it’s already reshaping businesses. You’ll learn why some of the world’s largest companies are racing to implement these technologies and, more importantly, why others are struggling to keep up.

第二章通过突破性的SPAR框架(感知、计划、行动、反思)解码人工智能代理的DNA。本章的核心是我们创新的五级代理模型。这套人工智能发展框架能够帮助您厘清人工智能代理能力方面的种种困惑。它使您能够评估任何人工智能代理,并了解它能为贵组织带来哪些益处。就像观察从早期汽车到自动驾驶汽车的演变过程一样,您将了解发展框架的每个阶段是如何在前一个阶段的基础上逐步构建,最终形成日益复杂和自主的系统。借助这些工具,您将深入了解这项技术的未来发展方向及其对您未来的影响。

Chapter 2 decodes the DNA of AI agents through a groundbreaking SPAR framework—Sense, Plan, Act, Reflect. At the heart of this chapter lies our innovative five-level Agentic AI Progression Framework that cuts through the confusion surrounding AI agents’ capabilities. It gives you the ability to evaluate any AI agent and know what it can achieve for you and your organization. Like watching the evolution from early automobiles to self-driving cars, you’ll understand how each level of the progression framework builds upon the last, creating systems of increasing sophistication and autonomy. With these tools, you’ll gain crucial insights into where this technology is headed and what it means for your future.

第三章将带您“深入人工智能代理的思维”,揭开这些数字智能的神秘面纱。如同人类学家研究一种新型智能,我们将探究它们的独特特征、能力和局限性。您将发现,它们为何有时能迸发出惊人的洞察力,却又会在看似简单的任务上屡屡碰壁。通过我们多年实践经验中引人入胜的案例,您将了解这些代理如何思考、决策和行动——对于任何希望与这些新型数字同事共事的人来说,这都是至关重要的知识。

Chapter 3 takes you “Inside the Mind of an AI Agent,” where we pull back the curtain on these digital minds. Like anthropologists studying a new form of intelligence, we examine their unique characteristics, capabilities, and limitations. You’ll discover why they sometimes make brilliant leaps of insight yet stumble over seemingly simple tasks. Through fascinating examples drawn from our years of implementation experience, you’ll learn how these agents think, decide, and act—essential knowledge for anyone looking to work alongside these new digital colleagues.

第四章将理论与实践完美结合,令人兴奋不已。在这一章中,我们将分享我们使用尖端人工智能代理进行的实践实验,其中包括一项引人入胜的探索:当我们挑战一个人工智能代理玩一个关于……人工智能制作回形针的游戏时,会发生什么?这项实验既发人深省又略带不安,它揭示了人工智能代理的现状和未来潜力。您将见证人工智能代理展现出的卓越问题解决能力,以及任何从事此类技术工作的人都必须了解的令人担忧的局限性。

The theoretical becomes thrillingly practical in Chapter 4. Here, we share our hands-on experiments with cutting-edge AI agents, including a fascinating exploration of what happened when we challenged an AI agent to play a game about... an AI making paperclips. This experiment, both enlightening and slightly unnerving, reveals profound insights about the current state and future potential of AI agents. You’ll witness both moments of brilliant problem-solving and concerning limitations that anyone working with these technologies needs to understand.

第二部分将深入探讨将简单的AI系统转变为真正智能体的基本能力——我们称之为三大基石:行动、推理和记忆。通过真实案例、前沿研究和实践实验,我们将探索这些核心能力如何协同运作,从而创造智能体。人工智能系统不仅能够处理信息,还能学习、适应和成长。

Part 2 takes us deep into the fundamental capabilities that transform a simple AI system into a true agent—what we call the Three Keystones: Action, Reasoning, and Memory. Through real-world stories, cutting-edge research, and hands-on experiments, we’ll explore how these core capabilities work together to create AI systems that don’t just process information but learn, adapt, and grow.

第五章深入探讨了“行动”这一关键要素,揭示了人工智能代理如何超越简单的建议,真正实现现实世界中的任务。通过引人入胜的实验和真实的案例,您将了解到为什么拥有更多工具并不总是能提升人工智能代理的效能,以及成功的组织如何在能力和控制之间找到最佳平衡点。我们将带您深入了解人工智能的实际应用,向您展示教会机器在现实世界中行动的成功与挑战。

Chapter 5 explores the Action keystone, revealing how AI agents move beyond mere suggestion to actually accomplish tasks in the real world. Through fascinating experiments and real implementation stories, you’ll discover why having more tools doesn’t always make an AI agent more effective and how successful organizations find the right balance between capability and control. We’ll take you behind the scenes of actual AI implementations, showing you both the triumphs and the pitfalls of teaching machines to act in the real world.

第六章中,我们将深入探讨推理——这或许是所有关键要素中最引人入胜的部分。通过对大型推理模型进行的突破性实验,您将发现,在人工智能决策方面,速度并非总是越好。我们将探索人工智能系统中不同类型的推理是如何产生的,从快速的模式匹配到更深层次的分析思维,并向您展示领先的组织如何构建能够系统性地思考复杂问题的人工智能代理。

In Chapter 6, we dive into Reasoning—perhaps the most intriguing of the keystones. Through groundbreaking experiments with large reasoning models, you’ll discover why faster isn’t always better when it comes to AI decision-making. We’ll explore how different types of reasoning emerge in AI systems, from quick pattern matching to deeper analytical thinking, and show you how leading organizations are building AI agents that can think through complex problems methodically.

第七章着重探讨“记忆”这一关键要素,深入剖析人工智能体如何从经验中学习并随着时间推移不断提升智能。通过一家全球电信公司转型历程的案例,您将了解到记忆不仅仅是信息的存储,它更关乎如何构建能够学习、适应和改进的系统。您将探索人工智能记忆的三层结构,从短期处理到长期记忆,并学习如何在您所在的组织中有效地应用这些结构。

Chapter 7 tackles the Memory keystone, exploring how AI agents learn from experience and grow smarter over time. Through the lens of a global telecommunications company’s transformation, you’ll learn why memory is more than just storing information—it’s about creating systems that can learn, adapt, and improve. You’ll discover the three layers of AI memory, from short-term processing to long-term retention, and learn how to implement them effectively in your own organization.

本系列的第三部分将从理论转向实践,向您展示如何将人工智能代理的变革潜力转化为切实可行的现实——无论您是为组织构建解决方案,还是打造下一个百万美元级企业。我们将通过详尽的案例研究、实践实验以及从实践中汲取的宝贵经验,指导您从最初的概念到成功的实施。

Part 3 of our journey moves from theory to practice, showing you how to turn the transformative potential of AI agents into tangible reality—whether you’re building solutions for your organization or launching the next million-dollar business. Through detailed case studies, practical experiments, and hard-won lessons from the field, we’ll guide you from initial concept to successful implementation.

第 8 章将以一家数字营销机构的真实转型案例为指导,逐步引导您构建高效的 AI 代理。您将学习如何发现合适的机遇、选择最佳的代理类型,以及设计能够创造真正价值的 AI 系统。本章提供了一份实用的路线图,涵盖从选择合适的平台到实施稳健的安全措施的各个方面。您将掌握如何定义代理目标和指令以保持控制、集成 API、备用方案和熔断机制,并遵循可立即应用的、可重复的实践流程,将代理的潜力转化为实际的商业成果。

Chapter 8 takes you step by step through the process of building effective AI agents, using the real-world transformation of a digital marketing agency as a guide. You’ll learn how to spot the right opportunities, choose the best agent types, and design AI systems that drive real value. This chapter provides a practical roadmap—from selecting the right platform to implementing robust safety measures. You’ll master how to define agent goals and instructions to maintain control, integrate APIs, fallbacks, and circuit breakers, and follow hands-on, repeatable processes you can apply immediately to turn agentic potential into business reality.

第九章“从创意到收益”将带您进入人工智能代理的创业前沿领域,探索正在迅速涌现的新型商业模式。您将了解到企业家们如何通过为医疗保健、金融和物流等行业创建专业人工智能代理来创造收入,以及您如何也能做到这一点。本章提供了一个行之有效的框架,帮助您识别高价值的代理商业机会,并将其转化为盈利项目。您还将探索“代理间经济”的兴起,在这种经济模式下,人工智能代理能够自主地进行交易、谈判和协作。最后,您将深入了解我们自身推出一款新闻通讯代理的经验,该代理在一个月内就发展到拥有30万订阅用户,充分展现了人工智能驱动型商业模式的强大力量。

Chapter 9, “From Ideas to Income,” takes you into the entrepreneurial frontier of AI agents, where new business models are rapidly emerging. You’ll discover how entrepreneurs are already generating revenue by creating specialized AI agents for industries like healthcare, finance, and logistics—and how you can do the same. This chapter provides a proven framework for identifying high-value agentic business opportunities and turning them into profitable ventures. You’ll also explore the rise of the “Agent-to-Agent Economy,” where AI agents transact, negotiate, and collaborate autonomously. Finally, you’ll get a behind-the-scenes look at our own experience launching a newsletter agent that scaled to 300,000 subscribers in just one month, illustrating the power of AI-driven business models.

第四部分是您在人工智能时代实现大规模业务转型战略的行动指南——一份远超技术层面的全面指南。我们将深入探讨成功组织变革的关键要素:战略设计、健全的治理、变革管理和价值创造。

Part 4 is your strategic playbook for business transformation at scale in the age of AI agents—a comprehensive guide that goes far beyond technical considerations. We dive deep into the critical ingredients of successful organizational change: strategic design, robust governance, change management, and value creation.

第十章揭示了可能决定人工智能代理实施成败的隐藏的人性因素。本章不仅仅关乎技术,更关乎人。您将学习如何将员工的恐惧转化为兴奋,设计能够赋能人类与人工智能协同工作的项目,并制定变革管理策略。这将潜在的阻力转化为积极的合作。我们将通过真实案例和实用框架,向您展示如何建立信任、培养新技能,并创建一个人类与人工智能代理无缝协作的工作场所,从而将潜在的挑战转化为强大的增长和创新机遇。

Chapter 10 uncovers the hidden human dynamics that can make or break AI agent implementation. This chapter isn’t just about technology—it’s about people. You’ll learn how to transform employee fear into excitement, design work that empowers humans alongside AI, and create a change management strategy that turns potential resistance into enthusiastic collaboration. Through real-world stories and practical frameworks, we’ll show you how to build trust, develop new skills, and create a workplace where humans and AI agents work together seamlessly, turning potential disruption into a powerful opportunity for growth and innovation.

第十一章“人工智能代理的规模化:从愿景到现实”探讨了当今企业面临的最紧迫挑战之一:如何从成功的试点项目过渡到全面实施。通过真实案例研究和实用框架,我们将向您展示为什么只有少数公司成功扩展了其人工智能代理项目,更重要的是,我们将告诉您如何才能成为成功者之一。我们将带您深入了解江森自控国际公司从基础自动化到复杂人工智能代理的转型历程,揭示他们在此过程中汲取的关键经验教训。您将学习他们克服常见规模化挑战的策略,从数据集成到变革管理,并了解如何为您的人工智能转型构建坚实的基础。

Chapter 11, “Scaling AI Agents: From Vision to Reality,” tackles one of the most pressing challenges organizations face today: how to move from successful pilots to full-scale implementation. Through real-world case studies and practical frameworks, we’ll show you why only a few companies successfully scaled their AI agent initiatives and, more importantly, how you can be among those that succeed. We’ll take you behind the scenes of Johnson Controls International’s journey from basic automation to sophisticated AI agents, revealing the critical lessons they learned along the way. You’ll learn their strategies for overcoming common scaling challenges, from data integration to change management, and discover how to build a robust foundation for your own AI transformation.

第十二章“跨行业的智能体案例研究与应用案例”中,您将深入了解当前最激动人心的AI智能体应用案例。本章以宠物之家(Pets at Home)的开创性案例和涵盖各行业的丰富应用案例为切入点,将AI智能体转型从理论转化为实际应用。您将探索企业如何应对现实挑战、改善客户体验并重塑工作模式。从兽医护理到零售业,从欺诈检测到个性化客户服务,这些案例研究不仅能为您提供信息,更能激发您发现AI智能体在自身组织中蕴藏的变革潜力。

In Chapter 12, “Case Study and Use Cases of Agents Across Industries,” get ready for a deep dive into the most exciting AI agent implementations happening right now. Through the groundbreaking example of Pets at Home and a comprehensive collection of use cases across industries, this chapter brings AI agent transformation from theory to practical reality. You’ll explore how organizations are solving real-world challenges, improving customer experiences, and reimagining work. From veterinary care to retail, from fraud detection to personalized customer service, these case studies will not just inform you—they’ll inspire you to see the transformative potential of AI agents in your own organization.

本系列的第五部分将带领我们超越人工智能代理的直接应用,探索它们对未来工作和社会产生的深远影响。这些章节描绘了一幅引人入胜的图景。描绘了随着人工智能代理变得越来越复杂和无处不在,未来将面临的挑战和机遇。

Part 5 of our journey takes us beyond the immediate implementation of AI agents to explore their profound implications for the future of work and society. These chapters paint a compelling picture of both the challenges and opportunities that lie ahead as AI agents become increasingly sophisticated and ubiquitous.

第十三章“新工作世界”探讨了人类能力与人工智能代理之间正在涌现的协同效应,同时也直面了这场前所未有的技术革命。通过塔拉(一位协调人工智能与人类协作的高级项目经理)和黛比(一位亲眼见证人工智能代理学习像她一样思考的资深项目经理)等案例,您将了解到未来三大核心能力如何重塑职业成功。本章揭示了这场变革的机遇与挑战——从培养不可替代的“人文素养”能力,到应对我们所谓的“适应悖论”,即关键技能的培养窗口期在不断缩短,而其复杂性却日益增加。您将获得在新时代蓬勃发展的实用策略,同时也会理解为什么传统的技术变革方法可能不再适用。

Chapter 13, “The New World of Work,” explores the emerging synergy between human capabilities and AI agents while confronting the unprecedented nature of this technological revolution. Through stories like Tara, a senior project manager orchestrating AI-human collaboration, and Debbie, a veteran project manager who watched an AI agent learn to think like her, you’ll discover how the Three Competencies of the Future are reshaping professional success. The chapter reveals both the promise and challenge of this transformation—from developing irreplaceable “Humics” capabilities to addressing what we call the “adaptation paradox,” where the window for developing crucial skills shrinks even as their complexity increases. You’ll gain practical strategies for thriving in this new era while understanding why traditional approaches to technological change may no longer suffice.

第十四章“智能体时代的社会”将视角放宽,探讨人工智能智能体对人类社会的更广泛影响。我们将探讨有关未来工作本身的诸多引人深思的问题,探索人工智能智能体如何将我们从日常琐事中解放出来,让我们从事更有意义的活动。通过实证研究和现实案例,我们将探讨全民基本收入和缩短工时等概念。本章最后提出了一个管理智能体的实用框架,以确保这些强大的技术在最大限度地发挥其社会效益的同时,始终处于人类的有效控制之下。

Chapter 14, “Society in the Age of Agents,” zooms out to examine the broader implications of AI agents for human society. We tackle provocative questions about the future of work itself, exploring how AI agents might free us from routine tasks to pursue more meaningful activities. Through empirical research and real-world examples, we’ll explore concepts like Universal Basic Income and the potential for reduced working hours. The chapter concludes with a practical framework for governing agentic AI, ensuring these powerful technologies remain under meaningful human control while maximizing their benefits to society.

超越书籍:您的在线资源

Beyond the Book: Your Online Resources

这本书只是你智能体探索之旅的开始。在AgenticIntelligence.academy 网站上,你可以拓展学习,并与充满活力的同行和专家社群建立联系。在那里,你会找到宝贵的资源和深入的课程。实用的工具,以及协作和成长的论坛,所有这些都是为了帮助你达到代理工作的顶峰。

This book is just the beginning of your agentic intelligence journey. Extend your learning and connect with a vibrant community of fellow practitioners and experts at AgenticIntelligence.academy. There, you’ll find valuable resources, in-depth courses, practical tools, and a forum for collaboration and growth, all designed to help you reach the top of your agentic game.

欢迎访问www.AgenticIntelligence.academy加入我们

Join us at www.AgenticIntelligence.academy

理解人工智能代理的关键术语

Key Terminologies for Understanding AI Agents

要有效驾驭人工智能代理的世界,掌握一些基础概念至关重要。这些术语将在本书中反复出现,因此清晰地定义它们非常重要。

To effectively navigate the world of AI agents, it’s crucial to grasp a few foundational concepts. These terms will appear repeatedly throughout the book, so it’s essential to define them clearly.

任务工作流程

Workflow of Tasks

任务工作流程是指为达成目标而需要完成的一系列结构化操作。您可以将其视为完成任务的路线图,其中每一步都依赖于前一步的成功完成。以冲泡一杯咖啡为例。首先,您需要将水壶装满水并烧开。然后,将咖啡粉放入杯中,倒入热水并搅拌。最后,根据需要添加牛奶或糖,即可享用咖啡。业务流程通常由多个工作流程组合而成;像订单到收款或采购到付款这样的大型流程可能包含数百个工作流程。

A workflow of tasks refers to a structured sequence of actions that need to be completed to achieve a goal. Think of it as a roadmap for getting things done, where each step depends on the successful completion of the previous one. Let us take the example of making a cup of coffee. First, you fill the kettle and boil the water. Next, add coffee to a mug, pour in the hot water, and stir. Finally, you add milk or sugar if needed and enjoy your coffee. A business process is typically composed of a combination of workflows; large processes like order-to-cash or procure-to-pay might be comprised of hundreds of workflows.

API(应用程序编程接口)

APIs (Application Programming Interfaces)

应用程序编程接口 (API) 是一座桥梁,它允许不同的软件系统相互通信。您可以将 API 想象成餐厅里的服务员:您下单(请求),服务员将其传递给厨房(另一个系统),然后将食物(响应)送回给您。API 对于人工智能代理至关重要,因为它们能够实现与所用工具的无缝集成——无论是访问数据库、检索实时信息还是连接其他系统。云服务。在大多数情况下,如果没有 API,人工智能代理将处于孤立状态,无法与工具交互。

An API (Application Programming Interface) is a bridge that allows different software systems to communicate with each other. Imagine an API as a waiter in a restaurant: you place an order (a request), the waiter relays it to the kitchen (another system), and then delivers the food (the response) back to you. APIs are essential for AI agents because they enable seamless integration with the tools they use—whether it’s accessing databases, retrieving real-time information, or connecting with cloud services. In most cases, without APIs, AI agents would be isolated and unable to interact with tools.

确定性系统与概率系统

Deterministic vs. Probabilistic Systems

根据处理输入和产生输出的方式,人工智能系统可以分为确定型人工智能系统和概率型人工智能系统。

Depending on how they process inputs and produce outputs, AI systems can be classified as deterministic or probabilistic.

确定性系统对于给定的输入总是产生相同的输出。它们遵循严格的规则和逻辑,确保结果的可预测性和可重复性。想想一个简单的计算器——当你输入“2 + 2”时,结果总是“4”。这类系统非常适合那些对精度和一致性要求极高的任务,例如财务计算或合规性检查。确定性系统曾被广泛应用于早期人工智能的“专家系统”中,如今仍然应用于机器人流程自动化系统、医院的临床决策支持系统以及许多其他领域。

Deterministic systems always produce the same output for a given input. They follow strict rules and logic, ensuring predictable, repeatable results. Think of a basic calculator—when you input “2 + 2,” you always get “4.” These systems are ideal for tasks where precision and consistency are crucial, like financial calculations or compliance checks. Deterministic systems were used exclusively in the “expert systems” of early AI, and are still present in robotic process automation systems, clinical decision support systems in hospitals, and a surprising number of other settings.

另一方面,概率系统基于概率而非固定规则运行。它们分析模式和数据进行预测,这意味着其输出会根据概率略有不同。大多数人工智能模型,包括大型语言模型和推荐系统,都属于这一类。它们不提供保证的答案,而是基于统计上最可能的结果生成响应。例如,当聊天机器人预测句子中的下一个词时,它会选择概率最高的选项,而不是遵循僵化的规则

Probabilistic systems, on the other hand, operate based on likelihoods rather than fixed rules. They analyze patterns and data to make predictions, meaning their outputs can vary slightly based on probabilities. Most AI models, including large language models and recommendation systems, fall into this category.11 They don’t provide guaranteed answers but instead generate responses based on the most statistically probable outcome. For example, when a chatbot predicts the next word in a sentence, it selects the option with the highest likelihood rather than following a rigid rule.

第一部分

PART 1

人工智能代理的崛起

THE RISE OF AI AGENTS

第一章

CHAPTER 1

超越 CHATGPT:人工智能的下一个进化阶段

BEYOND CHATGPT: THE NEXT EVOLUTION OF AI

T引言部分着重指出了我们当前人工智能方法的一个关键缺陷——拥有能够思考却无法行动的智能系统。现在,让我们探究我们是如何走到这一关键时刻的。在本章中,我们将追溯人工智能智能体得以实现的技术演进——两条强大发展路径的融合,最终创造出远超任何一条路径单独实现的成果。理解这段历史并非仅仅出于学术目的;它揭示了向智能体人工智能的转变为何蕴含着如此深远的机遇。

The introduction highlighted a critical gap in our current approach to AI—brilliant systems that can think but can’t act. Now, let’s explore how we arrived at this pivotal moment. In this chapter, we’ll trace the technological evolution that made AI agents possible—the convergence of two powerful streams that, when combined, created something greater than either could achieve alone. Understanding this history isn’t just academic; it reveals why the shift to agentic AI represents such a profound opportunity.

智能体人工智能的诞生:力量的融合

The Birth of Agentic AI: A Convergence of Powers

想象一下,你和一群企业领导人在会议室里。不可避免地,有人会问出我们已经听过无数遍的问题:“如果人工智能这么聪明,为什么它不能直接弄清楚需要做什么并去做呢?”

Picture yourself in a meeting room with a group of business leaders. Someone inevitably asks the question we’ve heard hundreds of times: “If AI is so smart, why can’t it just figure out what needs to be done and do it?”

这个问题直击当今人工智能领域缺失的核心。为了理解为什么这种能力一直如此难以捉摸——以及它最终是如何成为可能的——我们需要……探索不同的技术流派如何融合,创造出全新的事物:智能体人工智能。

This question gets to the heart of what’s missing in today’s AI landscape. To understand why this capability has been so elusive—and how it’s finally becoming possible—we need to explore how distinct technological streams have converged to create something entirely new: agentic AI.

智能体人工智能并非单一创新的产物,而是多项技术进步的融合——从语音助手到自动驾驶技术,再到人工智能驱动的应用程序接口(API)。然而,有两项技术发展趋势对于实现智能体人工智能至关重要:

Agentic AI isn’t the result of just one innovation. It’s a fusion of multiple advancements—from voice assistants to self-driving technology to AI-driven APIs. However, two technological streams stand out as the most critical in making agentic AI a reality:

大型语言模型的兴起。

The rise of large language models.

工作流程自动化的演变,现在被称为智能自动化。

The evolution of workflow automation, now known as intelligent automation.

两种技术的故事

A Tale of Two Technologies

人工智能自主化的发展历程并非简单的线性过程。它更像是两条河流各自奔流数英里后最终汇合,形成比任何一条河流单独存在时都更加强大的力量。

The story of agentic AI isn’t a simple, linear progression. Instead, it’s more like watching two rivers flow separately for miles before finally meeting to form something more powerful than either could be alone.

让我们先来看一个故事,它能说明这种融合为何如此重要。2022年,我们与一家全球制造企业合作,该公司正苦于客户服务效率低下。他们已经部署了一款基于大型语言模型的高级聊天机器人,能够以惊人的准确度理解并回复客户的咨询。他们还在后端系统中部署了机器人流程自动化(RPA)机器人,用于执行复杂的操作序列。然而,他们仍然缺少一些东西——理解与执行之间的桥梁。

Let’s start with a story that illustrates why this convergence matters. In 2022, we were working with a global manufacturing company that was struggling with customer service efficiency. They had already implemented an advanced chatbot powered by a large language model that could understand and respond to customer queries with remarkable accuracy. They had also deployed robotic process automation (RPA) bots that could execute complex sequences of actions in their backend systems. Yet something was missing—the bridge between understanding and doing.

他们的客服代表仍然需要扮演人工桥梁的角色,根据聊天机器人的建议手动触发相应的自动化工作流程。这让我们得以一窥如果这些技术能够直接协作将会实现的愿景,也帮助我们理解了这些技术的融合为何会带来如此巨大的变革。

Their customer service representatives still had to act as human bridges, taking the chatbot’s recommendations and manually triggering the appropriate automated workflows. It was a glimpse of what was possible if these technologies could work together directly, and it helped us understand why the convergence of these technologies would be so transformative.

第一条路径:通往大型语言模型之路

The First Stream: The Path to Large Language Models

人工智能发展到今天语言模型的历程始于1997年,而我们有幸见证了这一时刻。全世界都为IBM的“深蓝”击败国际象棋冠军加里·卡斯帕罗夫而惊叹不已。我们至今仍记得当时的头条新闻:“机器战胜人类!” 但大多数人却忽略了一点——“深蓝”与其说是天才,不如说更像是一位博学之士。它棋艺精湛,却连自己的走法都解释不了。如果你让它下跳棋,那还不如教鱼骑自行车来得容易。

The journey of AI that led to today’s language models began in 1997, and we were there to witness it. The world watched in amazement as IBM’s Deep Blue defeated chess champion Garry Kasparov. We remember the headlines: “Machine Beats Man!” But here’s what most people missed—Deep Blue was more like a savant than a genius. It could play chess brilliantly but couldn’t even explain its own moves. Try asking it to play checkers, and you’d have better luck teaching a fish to ride a bicycle.

这种局限性困扰着我们以及该领域的许多其他人。我们当时就想,肯定有更好的方法来创建智能系统。突破来自一个意想不到的方向——神经网络。现在,当我们解释神经网络时,我们喜欢用一个简单的比喻:想象一下,教一个孩子认识动物。你不会一开始就给他一本关于毛发、尾巴和腿数量的规则手册。相反,你会给他看例子:“这是一只狗。这是一只猫。这是一只鸟。”孩子的大脑自然而然地学会识别模式并进行概括。

This limitation bothered us and many others in the field. Surely, we thought, there must be a better way to create intelligent systems. The breakthrough came from an unexpected direction—neural networks. Now, when we explain neural networks, we like to use a simple analogy: imagine teaching a child about animals. You don’t start by giving them a rulebook about fur, tails, and the number of legs. Instead, you show them examples: “This is a dog. This is a cat. This is a bird.” The child’s brain naturally learns to recognize patterns and make generalizations.

神经网络革命

The Neural Network Revolution

神经网络的魅力在于它们以类似的方式学习。但它们需要三个要素才能充分发挥潜力:海量数据(想想数百万个例子)、强大的计算能力(想象一下数千台高端计算机协同工作)以及复杂的架构(我们组织这些人工脑细胞的巧妙方式)。

The beauty of neural networks is that they learn in a similar way. But they needed three ingredients to reach their full potential: vast amounts of data (think millions of examples), significant computing power (imagine thousands of high-end computers working together), and sophisticated architectures (the clever ways we organize these artificial brain cells).

我们还记得,当这些要素在2010年代最终融合在一起时,人工智能界的兴奋之情。这就像亲眼目睹莱特兄弟的首次飞行——突然间,看似不可能的事情变成了现实。系统能够以前所未有的准确度识别图像、理解语音和处理语言。

We remember the excitement in the AI community when these elements finally came together in the 2010s. It was like watching the first flight of the Wright brothers—suddenly, something that seemed impossible became reality. Systems could recognize images, understand speech, and process language with unprecedented accuracy.

语言模型的出现

The Emergence of Language Models

但真正的奇迹发生在语言处理领域。让我们回到语言人工智能的早期阶段——这就像试图通过给计算机一本字典和一本语法书来教它理解莎士比亚一样。结果不出所料,非常生硬。

But the real magic happened in language processing. Let us take you back to the old days of language AI—it was like trying to teach a computer to understand Shakespeare by giving it a dictionary and a grammar book. The results were about as wooden as you’d expect.

2017年到来,随之而来的是一项颠覆性的突破:Transformer架构。试想一下,如果人工智能不仅能够查找单词,还能理解上下文、把握含义,并洞察概念之间的联系,那该有多棒!这就像是从袖珍计算器升级到了数学家的大脑。

Then came 2017, and with it, a breakthrough that changed everything: the transformer architecture. Imagine giving an AI not just the ability to look up words but also to understand context, grasp meaning, and see how ideas connect. It was like upgrading from a pocket calculator to a mathematician’s brain.

我们在这段时期发现的扩展规律至今仍令我们惊叹。随着这些模型规模的扩大和训练数据的增加,神奇的事情发生了——它们发展出了此前从未被预先编程的能力。这就像亲眼目睹进化以快进的方式发生。2020年发布的GPT-3更是让我们震惊。这个系统能够编写代码、解决数学问题,甚至参与哲学讨论——而这些任务它从未被明确训练过。

The scaling laws we discovered during this period still amaze us. As these models grew larger and were trained on more data, something magical happened—they developed abilities nobody had programmed into them. It was like watching evolution happen in fast-forward. GPT-3, released in 2020, shocked us. Here was a system that could write code, solve math problems, and even engage in philosophical discussions—tasks it was never explicitly trained to do.

ChatGPT 于 2022 年问世,感觉就像攀登了几十年的高峰终于登顶。我们终于拥有了一个能够进行真正对话、推理问题并以人类能够理解的方式解释其思维过程的人工智能。但它仍然存在一个关键的局限性——它只能提出建议,而不能执行。

When ChatGPT arrived in 2022, it felt like reaching the summit of a mountain we’d been climbing for decades. Finally, we had an AI that could engage in genuine dialogue, reason through problems, and explain its thinking in ways that made sense to humans. But there was still one crucial limitation—it could only suggest actions, not take them.

第二条路径:自动化的演进

The Second Stream: The Evolution of Automation

在人工智能领域发生这一切的同时,自动化领域也在悄然进行着另一场革命。我们有幸近距离见证了这一演变,目睹了它从简单的屏幕抓取工具发展成为复杂的数字员工。

While all this was happening in the AI world, another revolution was quietly unfolding in the realm of automation. We’ve had a front-row seat to this evolution, watching it transform from simple screen-scraping tools to sophisticated digital workers.

机器人流程自动化的兴起

The Rise of Robotic Process Automation

21世纪初,许多业务仍然依赖人工操作或彼此独立的IT系统。具有前瞻性的团队开始使用脚本和宏来自动化重复性的计算机任务——例如,使用宏每天晚上将数据从Excel文件复制到大型机应用程序中。

In the early 2000s, much of business was still manual or relied on disparate IT systems that didn’t talk to each other. Forward-thinking teams started using scripts and macros to automate repetitive computer tasks—for example, a macro to copy data from an Excel file into a mainframe application every night.

在2010年代初期,我们积极参与了机器人流程自动化(RPA)的诞生。你可以把它想象成一种软件工具,它能够模拟人类在电脑上的操作:点击、打字、复制粘贴和阅读屏幕。听起来很简单,但它却具有革命性意义。我们第一次能够让计算机与人类并肩工作,使用与我们相同的工具和界面。

In the early 2010s, we actively participated in the birth of Robotic Process Automation (RPA). Think of it as software tools that mimic the actions of a human on a computer: clicking, typing, copy-pasting, and reading screens. It sounds simple, but it was revolutionary. For the first time, we could have computers work alongside humans, using the same tools and interfaces we use.

本质上,RPA“机器人”是一段软件程序,它被编程来执行一系列步骤——登录系统、检索数据、进行计算并将结果输入到其他地方——就像人一样,但速度更快,而且不会感到疲劳。RPA之所以流行,是因为它瞄准了最容易实现的目标:所有那些办公室工作人员反复执行的单调乏味、基于规则的任务。

Essentially, an RPA “robot” is a piece of software programmed to follow a series of steps—log into a system, retrieve some data, perform a calculation, and input results elsewhere—just like a human would, but faster and without fatigue. RPA became popular because it targeted low-hanging fruit: all those mundane, rules-based tasks that office workers repeatedly do.

企业积极采用RPA技术来减少错误,并将员工从繁琐的工作中解放出来。我们看到银行、保险公司和医院部署了RPA机器人来处理数据录入、发票处理、报告生成和数据库更新等工作。例如,我们的一位保险客户使用RPA将电子邮件中的保单数据自动传输到他们的原有系统中——过去需要团队花费一整天才能完成的工作,现在只需几分钟就能可靠地完成。

Businesses eagerly embraced RPA to reduce errors and free employees from drudgery. We saw banks, insurers, and hospitals deploy RPA bots to handle activities like data entry, invoice processing, report generation, and database updating. For instance, one insurance client of ours used RPA to automatically transfer policy data from emails into their legacy system—what used to take a team of people all day now happens reliably in minutes.

然而,这些早期的RPA解决方案存在局限性。它们很脆弱——如果一个屏幕发生变化或出现异常情况(例如缺少字段),机器人就会出错。RPA机器人没有智能或判断力;它们严格按照脚本执行。我们经常需要介入并更新机器人或手动处理特殊情况。本质上,RPA是流程驱动的自动化,优点是优点。适用于规则明确的结构化任务,但当事情偏离常规时则不具备适应性。

However, these early RPA solutions had limitations. They were fragile—if a single screen changed or an exception occurred (like a missing field), the robot would get confused. RPA robots had no intelligence or judgment; they strictly followed the script. We often had to step in and update the bot or handle edge cases manually. In essence, RPA was process-driven automation, good for structured tasks with clear rules, but not adaptable when things deviate from the norm.

向智能自动化演进

The Evolution to Intelligent Automation

下一步更加激动人心——将RPA与机器学习相结合,我们称之为智能自动化或超自动化。为了保持竞争力,RPA工具开始添加人工智能功能,以便处理更复杂、非结构化的工作。不妨这样理解:RPA擅长执行任务,但它缺乏思考能力。因此,企业开始利用人工智能技术(机器学习、自然语言处理 (NLP) 和计算机视觉)来增强RPA,从而创建能够解读信息并做出简单决策的自动化系统。

The next step was even more exciting—combining RPA with machine learning—what we named intelligent automation or hyperautomation. To stay competitive, RPA tools started adding AI capabilities so that they could handle more complex, unstructured work. Think of it this way: RPA is great at doing tasks, but it lacks any thinking. So, companies began to augment RPA with AI technologies—machine learning, natural language processing (NLP), and computer vision—to create automation that could interpret information and make simple decisions.

实际上,这意味着自动化工作流程可以读取客户的电子邮件(使用自然语言处理技术),判断请求内容(使用人工智能分类器),然后触发相应的RPA流程进行处理。自动化不再是盲目僵化的,而是具备了一定的上下文感知能力。

In practical terms, this meant an automated workflow could, for example, read an email from a customer (using NLP), decide what the request is about (using an AI classifier), and then trigger the appropriate RPA process to handle it. The automation was no longer blind and rigid; it became context-aware to a degree.

随着自动化能力的不断提升,我们的目标不再局限于单个任务,而是转向端到端的流程自动化。我们不再仅仅追求单个步骤的自动化,而是致力于实现整个工作流程或业务流程从头到尾的自动化。

As automation grew more capable, we started aiming beyond single tasks toward end-to-end process automation. Instead of just automating steps in isolation, the goal became to automate an entire workflow or business process from start to finish.

例如,我们帮助一家零售公司实现了从订单到收款流程的自动化:从接收线上订单、核实库存、处理付款、安排发货到更新财务记录。多个RPA机器人、界面和AI模型协同工作,像流水线一样传递任务,人工仅负责监控或处理异常情况。如果一切顺利,整个流程即可自动运行。

For example, for a retail company, we helped automate the order-to-cash process: From receiving an online order, verifying inventory, processing payment, and scheduling shipment to updating the finance records. Multiple RPA robots, interfaces, and AI models worked in concert, passing tasks along like an assembly line, with humans only monitoring or handling exceptions. When done right, the entire process runs on its own.

到2020年代初期,许多企业在日常流程中实现了高度自动化。我们观察到所谓的自动化瓶颈:大多数简单的任务已经实现自动化,瓶颈在于决策点动态情况,这些环节仍然需要人工参与。传统的自动化方式已无法再进一步,因为它缺乏适应能力和超越明确规则所需的高阶推理能力。

By the early 2020s, many businesses had achieved a high degree of automation in routine processes. We observed what we call the automation plateau: most of the straightforward tasks were already automated, and the bottleneck became the decision points and dynamic situations that still needed a human in the loop. Traditional automation could go no further because it lacked the adaptability and higher-level reasoning required beyond well-defined rules.

现在,我们来到了这一刻——这两股强大力量的交汇点。这就像看到两块拼图终于完美契合。语言模型赋予我们大脑——理解、推理和规划的能力。自动化技术则赋予我们双手——在现实世界中执行行动的能力。当两者融合时,我们就得到了智能人工智能——本质上,就是智能数字员工

And now, we arrive at the present moment—the convergence of these two powerful streams. It’s like watching two puzzle pieces finally click together. Language models provide the brain—the ability to understand, reason, and plan. Automation technologies provide the hands—the ability to execute actions in the real world. When both converge, we get agentic AI—in essence, intelligent digital workers.

正是这种组合使得智能体人工智能成为可能,我们很高兴能参与到这场变革中。我们正在见证人工智能系统从被动工具演变为积极的合作伙伴,它们既能理解需要做什么,又能真正执行。

This combination is what makes agentic AI possible, and we’re thrilled to be part of this revolution. We’re watching AI systems evolve from passive tools into active partners that can both understand what needs to be done and actually do it.

首个基于LLM的AI代理的诞生

Birth of the First LLM-based AI Agents

近几年来,人工智能的研究和开发迅速提升了人工智能代理的能力。像GPT-3和GPT-4这样的语言学习模型(LLM)最初是作为复杂的文本预测引擎,如今已被增强了规划行动和使用工具的能力。这意味着它们不再仅仅是完成句子,而是可以决定进行网络搜索、执行计算、调用应用程序接口(API)或调用其他软件,从而回答问题或完成任务。

AI research and development in the last few years have rapidly advanced the capabilities of AI agents. LLMs like GPT-3 and GPT-4, which initially functioned as sophisticated prediction engines for text, are now being augmented with the ability to plan actions and use tools. This means instead of just completing a sentence, they can decide to perform a web search, execute a calculation, call an API, or invoke another piece of software as part of answering a question or accomplishing a task.

最早基于语言学习模型(LLM)的人工智能代理框架之一是2022年提出的MRKL(模块化推理、知识和语言)。<sup> 12</sup>它专注于模块化推理,其中LLM与预定义的工具(例如“搜索”和“查找”)交互,以检索信息并回答查询。该框架将推理与行动分离,依赖外部模块来处理离散的推理任务。

One of the first LLM-based AI agent frameworks was MRKL (Modular Reasoning, Knowledge, and Language) in 2022.12 It focused on modular reasoning, where LLMs interact with predefined tools like “Search” and “Lookup” to retrieve information and answer queries. The framework separated reasoning from acting, relying on external modules for discrete reasoning tasks.

随后,随着ReAct的引入,该领域迅速发展。ReAct<sup> 13 </sup>进一步拓展了这一概念,使人工智能能够将推理步骤与行动相结合。该模型生成一个思维过程(推理轨迹),并可以执行诸如查询数据库或使用API​​之类的行动,然后利用新信息继续进行推理。这种推理与行动之间的协同作用帮助人工智能能够实时调整其计划并处理更复杂的任务。在实验中,ReAct通过让人工智能在回答问题之前检查信息来源(例如维基百科API),大大降低了其产生错误事实的倾向。此外,由于可以跟踪其逐步推理过程,ReAct还使人工智能的决策过程更加透明和可解释。<sup> 14</sup>

The field evolved rapidly then with the introduction of ReAct,13 which brought the concept further by enabling AI to intermix reasoning steps with actions. The model generated a thought process (a reasoning trace) and could take an action like querying a database or using an API, then continued reasoning with the new information. This synergy between reasoning and acting helped the AI adjust its plan on the fly and handle more complex tasks. In experiments, ReAct greatly reduced the AI’s tendency to hallucinate incorrect facts by letting it check a source (like a Wikipedia API) before answering. It also made the AI’s decision process more transparent and interpretable, since you could follow its step-by-step reasoning.14

Toolformer(2023)是一款突破性的人工智能模型,它能够自学使用计算器、网络搜索和翻译器等外部工具。这解决了大型语言模型的一个关键缺陷:由于依赖静态训练数据,大型语言模型在处理算术运算和实时信息方面存在困难。Toolformer 通过决定何时以及如何调用外部工具,提高了计算和问答的准确性。<sup> 15</sup>

Toolformer (2023) was a breakthrough AI model that taught itself to use external tools like calculators, web search, and translators. This addressed a key weakness of large language models, which struggled with arithmetic and real-time facts due to their reliance on static training data. By deciding when and how to call external tools, Toolformer enhanced accuracy in calculations and question-answering.15

这些进步催生了能够处理多步骤任务的高性能人工智能代理。例如,您可能听说过像 AutoGPT<sup> 16</sup>或 BabyAGI<sup> 17</sup>这样的实验性系统,它们在爱好者中广受欢迎。这些本质上是围绕 LLM(逻辑层级模型)设计的封装代理,旨在自主地追求目标:它们从逻辑层级模型中获取一个高级目标。用户输入指令后,系统会生成子任务,执行这些子任务(通常是通过发出工具命令,甚至编写并运行代码),检查结果,并不断迭代直至达成目标。虽然这些都是前沿实验,有时也比较脆弱(容易出错或卡住),但它们展示了智能体人工智能的发展方向。

These advances have led to the emergence of highly capable AI agents that can handle multi-step tasks. For example, you might have heard of experimental systems like AutoGPT16 or BabyAGI,17 which gained popularity among enthusiasts. These are essentially wrapper agents around LLMs designed to autonomously pursue goals: they take a high-level goal from a user and then generate sub-tasks, execute them (often by issuing tool commands or even writing and running code), check the results, and iterate until the goal is achieved. While these are cutting-edge experiments and sometimes brittle (prone to getting confused or stuck), they illustrate the direction of agentic AI.

与此同时,诸如 LangChain 18和 Semantic Kernel 19等框架应运而生,增强了这些功能,使语言学习模型 (LLM) 更容易与外部 API、数据库和其他系统交互。突然之间,智能体不再孤立地工作——它们开始与数字世界连接,执行诸如自动化工作流程、检索信息或控制应用程序等任务。

In parallel, frameworks like LangChain18 and Semantic Kernel19 emerged to enhance these capabilities, making it easier for LLMs to interact with external APIs, databases, and other systems. Suddenly, agents weren’t just working in isolation—they were connecting with the digital world, performing tasks like automating workflows, retrieving information, or controlling applications.

逻辑逻辑模型(LLM)中函数调用的发明进一步推动了这一进程。<sup> 20</sup>它允许智能体通过运行特定的函数或脚本,在其推理过程中与外部系统进行精确交互。这一突破意味着智能体不仅可以进行计划和思考,还可以以高度精准和高效的方式采取行动。

The invention of function calling within LLMs pushed this even further.20 It allowed agents to interact precisely with external systems by running specific functions or scripts as part of their reasoning process. This breakthrough meant agents could not only plan and think but also act in highly targeted and efficient ways.

这项研究进一步强化了这一演变过程。例如,Gorilla 21等论文展示了模型如何学习有效地使用工具,而来自微软22、斯坦福23和腾讯24的研究表明,协作型智能体(多个智能体协同工作)取得了更大的成功。

The research reinforced this evolution. Papers like Gorilla21 demonstrated how models could learn to use tools effectively, while studies from Microsoft,22 Stanford,23 and Tencent24 revealed that collaborative agents—multiple agents working together—achieved even greater success.

到2023年,随着AutoGen、 Google Cloud Vertex AICrew.ai面向企业的平台的兴起,这些概念逐渐被主流接受,并创造了LLM驱动的智能体得以蓬勃发展的环境。通过集成自动化功能,LLM已从单纯的文本处理器转变为能够重塑任务执行方式、决策制定方式以及技术与世界交互方式的系统基础。

By 2023, the rise of platforms available to enterprises like AutoGen,25 Google Cloud Vertex AI,26 and Crew.ai27 took these concepts mainstream, creating environments where LLM-driven agents could thrive. By incorporating automation capabilities, LLMs have transformed from mere text processors into the foundation of systems capable of reshaping how tasks are performed, how decisions are made, and how technology interacts with the world.

蓬勃发展的格局:当今的人工智能代理市场

The Booming Landscape: Today’s AI Agent Market

市场预测表明,到 2030 年,该市场将以惊人的每年 44% 的速度增长。28我们很高兴看到智能体 AI 市场蓬勃发展,因为它证实了我们长期以来的信念——AI 代理不仅仅是一种短暂的潮流;它们是企业和个人未来运营方式的发展方向。

Market projections indicate that the market will grow at an incredible 44% per year by 2030.28 We’re thrilled to see the agentic AI market booming because it confirms what we’ve long believed—AI agents aren’t just a passing trend; they’re the future of how businesses and individuals will operate.

Gartner预测,到2028年,三分之一的企业软件应用将集成智能体人工智能,这一事实也标志着企业在自动化、决策和生产力方面的方式发生了根本性转变。29最令我们兴奋的是红杉资本预测,人工智能代理有望开拓一个价值10万亿美元的市场,涵盖全球服务和软件市场。30

The fact that Gartner projects that one in three enterprise software applications will integrate agentic AI by 2028 also signals a fundamental shift in how companies approach automation, decision-making, and productivity.29 But what excites us most is Sequoia Capital’s projection that AI agents could address a $10 trillion market, combining both global services and software markets.30

智能体人工智能市场是一个新兴、爆发式增长且竞争异常激烈的市场。已有数百家供应商提供智能体平台,初创公司和大型企业都在竞相开发各行业的人工智能智能体。

The agentic AI market is new, explosive, and highly competitive.31 Hundreds of vendors are already offering agent platforms, while both startups and major companies are racing to develop AI agents across industries.32

由于这个市场正朝着多个方向快速发展,因此很难把握方向。为了便于理解,我们将其分为三大类:

Because this market is evolving rapidly in many directions, it can be difficult to navigate. To make it easier to understand, we like to break it down into three main categories:

首先是可定制平台,专业人士和组织可以利用这些平台构建自己的代理。这些平台可应用于各个行业和职能部门。它们涵盖了从 Beam 33和 Relevance.ai 34等无代码解决方案,到 UiPath 35和 Microsoft 的 Agent Builder 36、Crew.ai 37以及 ServiceNow 38等低代码平台。 以及像 Langchain 39和 AutoGen 40这样的完整编程框架。这些工具使企业能够创建根据其特定需求和流程量身定制的代理。

First are customizable platforms that let professionals and organizations build their own agents. These platforms can be used across business industries and functions. They range from no-code solutions like Beam33 and Relevance.ai34 to low-code platforms like UiPath35 and Microsoft’s Agent Builder,36 Crew.ai,37 and ServiceNow38 to full programming frameworks like Langchain39 and AutoGen.40 These tools let businesses create agents tailored to their specific needs and processes.

其次是通用型智能体,例如 OpenAI 的 Operator<sup> 41</sup> 、 Anthropic 的 Computer Use<sup> 42</sup>和 Google 的 Project Mariner<sup> 43</sup> 。这些智能体功能更加全面,能够处理各种任务并适应不同的环境。您可以将它们视为智能数字助理,它们可以无缝地在多个系统中导航,理解复杂的目标,并直接通过屏幕执行任务——无需复杂的集成。我们认为,智能体人工智能的“ChatGPT时刻”将来自于这类智能体的发展,因为它们将广泛地被消费者和企业所接受。

Second are the generalist agents, like OpenAI’s Operator,41 Anthropic’s Computer Use,42 and Google’s Project Mariner.43 These are more versatile, capable of handling a wide range of tasks and adapting to different contexts. Think of them as intelligent digital assistants that seamlessly navigate multiple systems, understand complex goals, and execute tasks directly through the screen—no complex integrations required. In our view, the “ChatGPT moment” of agentic AI will come from the evolution of this category of agents, as they will become widely available to both consumers and companies.

第三类是专业代理,例如谷歌或OpenAI深度研究(专注于研究)、Agentforce(专注于销售和客户关系)<sup> 44</sup>或Hippocratic AI(专注于医疗保健代理) <sup> 45</sup> 。其中一些代理专注于跨行业特定功能(横向代理),而另一些则针对单一行业的独特需求量身定制(纵向代理)。46其中大多数是即用型代理,专注于特定任务——无论是分析法律文件、优化营销活动还是进行公证工作。您可以在 agent.ai 等市场上找到数百个这样的专业代理,47每个代理都旨在擅长特定功能。

Third are the specialist agents, like Google or OpenAI Deep Research (specialized in research), Agentforce (specialized in sales and customer relationships),44 or Hippocratic AI (specialized in agents for healthcare).45 Some of these agents specialize in specific functions across industries (horizontal agents), while others are tailored to a single industry’s unique needs (vertical agents).46 Most of these are ready-to-use agents focused on specific tasks—whether it’s analyzing legal documents, optimizing marketing campaigns, or conducting notarial work. You can find hundreds of these specialized agents on marketplaces like agent.ai,47 each designed to excel at a particular function.

在附录中,您将找到当前市场产品的详细分类,该分类是根据第 2 章介绍的 5 级发展框架构建的。

In the appendix, you’ll find a detailed breakdown of the current market offerings, structured according to the 5-Level Progression Framework introduced in Chapter 2.

此外,我们也明白,在这个市场中摸索前行并非易事,尤其是在选择合适的经纪人平台开启您的经纪人之旅时。因此,在第八章中,我们不仅会指导您如何培养经纪人,还会帮助您选择最符合您需求和目标的平台。

In addition, we understand that navigating this market can be challenging, especially when choosing the right agent platform to start your agentic journey. That’s why, in Chapter 8, we not only guide you through building agents but also help you select the best platform to match your needs and objectives.

面向创业和商业的智能体人工智能

Agentic AI for Entrepreneurship and Business

我们正在讨论的这些人工智能代理不仅能替我们思考,还能替我们行动。过去几年里,我们参与开发并部署了一些这样的人工智能代理,与它们合作的感觉就像是在与一种新型的同事共事。接下来,让我们深入探讨一下这在实践中意味着什么。

The AI agents we’re discussing don’t just think for us; they also act for us. Over the last couple of years, we’ve helped develop and deploy some of these AI agents, and working with these agents truly feels like collaborating with a new kind of colleague. Let’s unpack what this means in practice.

隆重推出您的智能数字员工

Introducing Your Intelligent Digital Workers

智能数字员工就像一个虚拟员工,能够自主处理流程。它被“雇佣”来执行特定工作,例如IT支持人员、客户服务代表等。或者说,它就像一个营销助理——只不过是由代码构成。它之所以能够胜任这项工作,是因为它结合了语言学习模型(LLM)的认知能力和自动化软件的直接操作能力。LLM 使智能体能够理解指令、进行自然语言对话并做出合理的决策。自动化部分则赋予它执行各种操作的能力:点击按钮、检索和输入数据、调用 API、协调其他软件工具等等。这些功能结合起来,使人工智能智能体能够端到端地处理复杂的任务。

An intelligent digital worker is like a virtual employee who can handle a process autonomously. It’s “hired” to perform a job, such as an IT support agent, a customer service representative, or a marketing assistant—only it’s made of code. What enables it to do the job is the combination of an LLM’s cognitive abilities with the direct action capabilities of automation software. The LLM allows the agent to interpret instructions, converse in natural language, and make reasoned decisions. The automation side empowers it to execute actions: clicking buttons, retrieving and entering data, calling APIs, orchestrating other software tools, etc. Together, these let the AI agent handle complex tasks end-to-end.

实际上,人工智能代理的工作原理如下:假设你给人工智能代理设定一个高层次的目标,例如“更新我们的社交媒体,回应最新产品发布后的反馈”。一个具备语言能力的代理首先会理解这个请求(必要时可能会像人一样向我们寻求澄清)。然后,它会将目标分解成可执行的步骤:例如,1)从社交媒体和评论中收集最新的客户反馈;2)分析情绪(正面与负面主题);3)撰写回应或公关稿,解决客户的疑虑;4)发布回应或安排博客更新;以及5)监控用户反应。

In practical terms, here’s how an AI agent works: Suppose you give an AI agent a high-level goal like, “Update our social media to respond to the latest product launch feedback.” A language-capable agent will first understand the request (maybe asking us for clarification if needed, just as a human would). Then it will break down the goal into actionable steps: e.g., 1) Gather recent customer feedback from social media and reviews, 2) Analyze sentiment (positive vs. negative themes), 3) Draft responses or a PR message addressing concerns, 4) Post the responses or schedule a blog update, and 5) Monitor reactions.

在传统模式下,这可能需要市场营销和客户支持团队协调这些步骤。然而,人工智能代理可以独立完成这些工作:利用集成接口从 Twitter 或 Facebook 获取数据,运行自然语言处理 (NLP) 情感分析模型,通过其语言学习模型 (LLM) 生成文本,并与公司的社交媒体管理工具对接以发布更新。

In a traditional setting, this might involve a team of people across marketing and customer support coordinating these steps. An AI agent, however, can coordinate them on its own: using integration hooks to pull data from Twitter or Facebook, running an NLP sentiment analysis model, generating text through its LLM, and interfacing with the company’s social media management tool to publish updates.

整个过程中,代理都会跟踪上下文——它知道总体目标是什么,如果发生意外情况,它会调整自己的行动(例如,如果它发现一条异常重要的推文,它可能会将该具体问题上报给人类)。

Throughout, the agent keeps track of context—it knows what the overall objective is and adjusts its actions if something unexpected happens (for example, if it finds an unusually critical tweet, it might escalate that specific issue to a human).

AI代理在实际应用中:现实世界的变革

AI Agents in Action: Real-World Transformations

我们已在各个行业实施了第一手案例。让我们来看几个智能数字员工在实际应用中的案例。

We have implemented firsthand examples across industries. Let’s look at a few real-world use cases of intelligent digital workers in action.

以客户服务为例,这是人工智能代理最直接、最直观的应用之一。过去那种生硬、脚本化的聊天机器人让用户感到沮丧的时代已经一去不复返了。如今的人工智能代理能够进行完整的对话,诊断问题,并采取行动解决问题。我们曾与一家电信公司合作,部署了一款人工智能驱动的客服代理,能够处理常见的从头到尾的技术支持电话。

Take customer service, one of the most immediate and visible applications of AI agents. Gone are the days of rigid, scripted chatbots that frustrate users. Today’s AI agents hold full conversations, diagnose issues, and take action to resolve them.48 We worked with a telecom company to deploy an AI-driven support agent capable of handling common tech support calls from start to finish.

客户来电反映网络问题——客服人员聆听客户诉说,转录语音,并利用大型语言模型理解问题所在。随后,系统进行故障排除:远程检查调制解调器,必要时指导客户重置,如果问题仍然存在,则创建服务工单或安排技术人员上门。整个交互过程中,问题要么得到解决,要么无缝升级,无需人工干预。人工智能不仅提供答案,还能采取行动,管理客户请求的整个生命周期。

A customer calls in with an internet issue—the agent listens, transcribes their words, and understands the problem using a large language model. Then, it troubleshoots: it remotely checks the modem, walks the customer through a reset if needed, and, if the issue persists, creates a service ticket or schedules a technician. By the end of the interaction, the problem is either resolved or seamlessly escalated, without human intervention. The AI doesn’t just provide answers—it takes action, managing the full lifecycle of a customer request.

现在,想象一下将这种智能水平应用于财务和会计领域。我们帮助一个全球财务团队引入了一款“数字分析代理”,实现了月度预算差异分析的自动化。以往,人工分析师需要花费数天时间筛选数据,而这款人工智能代理则能从会计系统中提取数据,识别支出偏离计划的地方,甚至还能利用自然语言模型撰写解释说明。它发现了异常情况(例如不寻常的支出),并将其标记出来以供人工审核。

Now, imagine this level of intelligence applied to finance and accounting. We helped a global finance team introduce a “digital analyst agent” that automated their monthly budget variance analysis. Instead of human analysts spending days sifting through numbers, this AI agent pulled data from the accounting system, identified where spending deviated from the plan, and even drafted explanations using natural language models. When it spotted anomalies—such as an unusual expense—it flagged them for human review.

随着时间的推移,人工智能通过学习反馈,不断提升区分正常波动和真正危险信号的能力。曾经需要耗费数小时的人工操作,如今已简化为高效流程,人工智能承担了大部分繁重的工作,人类仅在必要时才介入。最终成果如何?财务报告系统变得更加快捷、精准,并且摆脱了繁琐的人工工作——就像拥有一个永不眠的初级分析师,持续不断地处理数据并撰写分析报告一样。

Over time, by learning from feedback, the AI improved its ability to distinguish between normal fluctuations and true red flags. What once required hours of manual effort became a streamlined process, with the AI handling the heavy lifting and humans stepping in only when necessary. The result? A system where financial reporting became faster, more precise, and free of tedious manual work—like having a junior analyst who never sleeps, continuously crunching numbers and preparing insights.

第十二章将深入探讨各行业的真实案例研究和应用案例。您将看到企业如何利用人工智能代理解决复杂挑战、提升客户体验并重新定义未来工作模式。

In Chapter 12, we take a deep dive into real-world case studies and use cases across industries. You’ll see how organizations are solving complex challenges, enhancing customer experiences, and redefining the future of work with AI agents.

公司面临的机遇与挑战

Opportunities and Challenges for Companies

对于企业领导者和决策者而言,人工智能的兴起既是机遇也是挑战。机遇方面,其带来的益处令人垂涎。我们可以通过将工作交给人工智能代理来大幅提升效率——试想一下,无需增加一倍的员工人数,就能实现员工队伍的翻倍增长,因为那些繁琐的日常任务都由不知疲倦的数字员工来处理。我们还预见到质量和一致性也将得到提升。

For business leaders and decision-makers, the rise of agentic AI is both an opportunity and a challenge. On the opportunity side, the benefits are tantalizing. We can achieve massive efficiency gains by offloading work to AI agents—imagine doubling your workforce without doubling headcount, as mundane tasks are handled by tireless digital workers. We also foresee improvements in quality and consistency.

经过适当训练和管理的AI代理,每次都会遵循最佳实践,并且可以将合规性检查嵌入到每个操作中(从而降低人为错误的风险)。或许最令人感兴趣的是,智能体AI可以解锁以前无法实现的新功能。例如,您可以部署20个AI代理,在一夜之间模拟各种市场场景,并在第二天早上获得战略建议——这种规模的分析是任何人类团队都无法在短时间内完成的。或者提供真正个性化的服务。通过人工智能礼宾服务,同时为成千上万的客户提供服务。

An AI agent, when properly trained and governed, will follow best practices every single time, and it can embed compliance checks into every action (reducing the risk of human error). Perhaps most interestingly, agentic AI can unlock new capabilities that weren’t feasible before. For example, you might deploy 20 AI agents to simulate various market scenarios overnight and have strategic recommendations by morning—a scale of analysis no human team could do in time. Or provide truly personalized service to each of thousands of customers simultaneously through AI concierges.

在客户服务和决策支持等领域,这意味着客户能获得更快的响应,而高管则能获得更深入的洞察以辅助决策。人工智能代理可以与人类协同工作——可以将其想象成每个员工都拥有一个人工智能副驾驶,它可以承担任务或提供建议,从而显著提升员工一天的工作效率。

In fields like customer service and decision support, this means customers get faster responses, and executives get deeper insights for decision-making. AI agents work alongside humans as collaborators—think of it as each employee having an AI copilot that can take on tasks or provide suggestions, dramatically amplifying what that employee can accomplish in a day.

挑战在于,我们必须谨慎地推进过渡。将人工智能代理集成到工作流程中需要周密的变革管理。工作必须经过精心设计,以实现人机协作的最佳效果,确保员工信任并理解他们的新人工智能伙伴。在部署过程中,我们的部分工作就是揭开代理决策的神秘面纱(添加解释或人工智能决策的“审计跟踪”),从而让人们感到安心。

On the challenge side, we must navigate the transition carefully. Integrating AI agents into workflows requires careful change management. Work must be intentionally designed to allow optimal human and AI collaboration, ensuring that employees trust and understand their new AI teammates. Part of our job in deployments has been demystifying how the agent makes decisions (adding explanations or “audit trails” of AI decisions) so that people feel comfortable.

此外,构建人工智能代理还面临着诸多技术挑战,远不止开发本身。代理的可靠性是最大的难题之一——如何确保代理始终如一地按预期运行,避免错误或意外行为。定义精确的目标和指令同样复杂,需要不断迭代以改进响应和决策。

In addition, building AI agents comes with several technical challenges that go beyond just development. Agent reliability is one of the biggest hurdles—ensuring that agents consistently perform as expected without errors or unintended behaviors. Defining precise goals and instructions is equally complex, requiring constant iteration to refine responses and decision-making.

无缝集成是另一个关键因素;人工智能代理必须能够轻松地与现有工具、API 和工作流程连接,才能真正发挥作用。数据质量也至关重要——输入垃圾数据会导致输出垃圾数据,使代理不可靠、容易出错,并且无法做出准确的决策。归根结底,成功的部署不仅仅是构建代理——而是要不断地集成、测试和改进它们,以确保它们能够带来实际价值。

Seamless integration is another key factor; AI agents must connect effortlessly with existing tools, APIs, and workflows to be truly effective. Data quality also plays a crucial role—garbage in leads to garbage out, making agents unreliable, error-prone, and incapable of accurate decision-making. Ultimately, successful deployment isn’t just about building agents—it’s about continuously integrating, testing, and refining them to ensure they deliver real-world value.

此外,还有监管问题:人工智能代理需要安全保障和伦理准则。它们功能强大,但它们仍然需要与业务规则、价值观和监管要求保持一致。人与人工智能的协作只有在人类最终仍然掌控目标设定和关键决策审查的情况下才能奏效——尤其是在金融或医疗保健等高风险领域。

There’s also the question of oversight: AI agents need guardrails and ethical guidelines. They are powerful, but they should still align with business rules, values, and regulatory requirements. Collaboration between humans and AI will only work if humans ultimately remain in control of setting goals and reviewing critical decisions—especially in areas like finance or healthcare where stakes are high.

幸运的是,人工智能治理工具正在不断改进,我们始终强调人机协作,尤其是在使用智能体的早期阶段。不妨把人工智能代理想象成一位能力很强的新员工——你不会让新员工在没有培训和监督的情况下自由发挥,人工智能也是如此。

Fortunately, tools for AI governance are improving, and we always stress a human-in-the-loop approach, especially in the early stages of using agentic AI. Think of the AI agent as a very capable new hire—you wouldn’t let a new employee run wild without training and oversight, and the same goes for AI.

展望未来,我们必须牢记,智能体人工智能是增强型的,而非完全的替代型的。最终的理想模式是人工智能与人类协同工作。人工智能负责繁重的工作和日常琐事,而人类则提供指导、创造力和关键的监督。在一个成功的部署案例中,一位客户告诉我们,他们的团队成员开始称人工智能代理为“团队成员”——这种协作思维正是取得最佳成果的关键。

As we move forward, it’s important to remember that agentic AI is augmentative, not purely replacement. The ultimate model is AI + Human working in tandem. The AI handles the heavy lifting and routine grind while humans provide guidance, creativity, and critical oversight. In one successful deployment, a client told us their human team members started calling the AI agent “a member of the team”—that’s the kind of collaborative mindset that leads to the best outcomes.

我们正步入一个未来,你的下一位“员工”很可能就是人工智能。拥抱并负责任地塑造这个未来,将是未来几年企业面临的关键主题。能够参与其中,令人无比兴奋。

We’re entering a future where your next new “employee” might just be an AI. Embracing this future and shaping it responsibly will be a key theme for businesses in the years to come. It’s an exciting time to be part of this journey.

为了更深入地探讨这个问题,本书第四部分将作为您在人工智能时代进行业务转型的战略指南。您将找到真实世界的用例、案例研究,以及成功实现组织变革的关键要素——包括战略、治理、变革管理和价值创造。

To explore this further, Part 4 of the book serves as your strategic playbook for business transformation in the age of AI agents. You’ll find real-world use cases, case studies, and the key ingredients for successful organizational change—including strategy, governance, change management, and value creation.

初创公司:人工智能代理时代的最大赢家

Startups: The Biggest Winners in the AI Agent Era

正如电子商务、社交媒体和SaaS重塑了各行各业一样,人工智能代理也将解锁全新的商业模式,创造……这为企业家提供了在新兴市场抢占先机的机会。我们相信,初创企业将从人工智能代理的自动化中获益最多,数据也证明了这一点。<sup> 49</sup>初创企业在使用人工智能代理的公司中占比最大——例如,使用 LangChain 代理平台的企业中,超过 50% 的企业员工人数不足 100 人,而拥有超过 10,000 名员工的大型企业仅占 15% 左右。<sup> 50</sup>

Just as e-commerce, social media, and SaaS reshaped industries, AI agents will unlock entirely new business models, creating opportunities for entrepreneurs to be the first movers in emerging markets. We believe startups stand to gain the most from AI agents’ automation, and the numbers prove it.49 They make up the largest share of companies leveraging AI agents—for example, over 50% of businesses using the LangChain agent platform have fewer than 100 employees, while large enterprises with over 10,000 employees account for barely 15%.50

与受制于官僚作风和传统系统的大型企业不同,初创公司可以从一开始就围绕人工智能代理构建整个商业模式。这赋予了它们巨大的竞争优势,具体体现在以下几个方面:

Unlike large enterprises, which are more tied to bureaucracy and legacy systems, startups can build their entire business models around AI agents from day one. This gives them a massive competitive edge by enabling:

降低运营成本——核心任务自动化减少了对大型团队的需求。

Cost-efficient operations—Automating core tasks reduces the need for large teams.

敏捷性和快速原型设计——人工智能代理能够加快市场进入和迭代速度。

Agility and rapid prototyping—AI agents enable faster market entry and iteration.

大规模个性化客户互动——人工智能代理使初创公司能够超越依赖非个性化、通用方法的大型竞争对手。

Personalized customer interactions at scale—AI agents allow startups to outmaneuver larger competitors, which rely on impersonal, generalized approaches.

摆脱集成限制——与大公司不同,初创公司不必对需要更新的工作流程进行人工智能改造。

Freedom from integration constraints—Unlike big companies, startups don’t have to retrofit AI into workflows that need to be updated.

这种敏捷性使初创公司能够开辟利基市场,提供高度定制化的解决方案,并在服务不足的领域蓬勃发展。随着人工智能代理从简单的自动化演变为自主决策者,它们将成为下一波颠覆性商业浪潮的支柱。以下是一些初创公司的例子:能够抓住这些机会的是 Perplexity、51 Ramp、52 Superhuman、53Replit。54

This agility allows startups to carve out niche markets, offer hyper-customized solutions, and thrive in underserved sectors. As AI agents evolve beyond simple automation into autonomous decision-makers, they will become the backbone of the next wave of disruptive businesses. Examples of startups that have been able to seize these opportunities are Perplexity,51 Ramp,52 Superhuman,53 or Replit.54

为了更深入地探讨这一主题,第九章“从创意到收益”揭示了企业家们如何在医疗保健、金融和物流等行业构建专业化的人工智能代理,以及您如何也能做到这一点。本章还提供了一个行之有效的框架,用于识别高价值的代理商业机会,并将其转化为盈利的企业。

To explore this topic further, Chapter 9, “From Ideas to Income,” reveals how entrepreneurs are already building specialized AI agents in industries like healthcare, finance, and logistics—and how you can do the same. This chapter also offers a proven framework for identifying high-value agentic business opportunities and transforming them into profitable ventures.

企业人工智能代理采用现状

The State of AI Agent Adoption in Companies

为了解企业如何部署人工智能代理并从中获益,我们对已超越基于规则的代理和传统逻辑层模型(LLM)使用方式,转而部署基于逻辑层模型的人工智能代理的企业进行了广泛的分析。我们的研究团队收集并分析了来自各行各业167家已在生产环境中部署此类代理的企业的数据。该研究重点关注部署模式、面临的挑战、已实现的收益以及关键成功因素。

To understand how organizations are implementing and benefiting from AI agents, we conducted an extensive analysis of companies that have moved beyond rule-based agents and the conventional use of LLMs to deploy LLM-based AI agents. Our research team collected and analyzed data from 167 companies across various sectors that have implemented such agents in production environments. The study focused on understanding implementation patterns, challenges faced, benefits realized, and key success factors.

当前实施状态

Current State of Implementation

人工智能代理的应用格局在各行业呈现出引人入胜的模式。科技和软件公司引领潮流,在我们的研究中,成功​​应用案例中近四分之一都来自它们。考虑到它们的技术实力和创新热情,这并不令人意外。金融服务业紧随其后,占比18%,零售业位列第三,占比16%。

The landscape of AI agent adoption reveals fascinating patterns across industries. Technology and software companies are leading the charge, representing nearly a quarter of successful implementations in our study. This isn’t surprising, given their technical capabilities and appetite for innovation. Financial services follow closely at 18%, with retail completing the top three at 16% of implementations.

图像

表 1.1:根据我们的研究,实施 AI 代理的公司行业分布情况(来源:© Bornet 等人)

Table 1.1: Industry distribution of companies implementing AI agents according to our research (Source: © Bornet et al.)

这些早期采用者并非只是在进行试验——他们已经取得了显著的成果。例如,微软报告称,在其销售部门使用人工智能代理后,每位销售人员的收入提高了9.4%,成交量增加了20%。在金融领域,摩根大通部署的人工智能代理将欺诈行为减少了惊人的70%。这些成功案例正在鼓励更多公司探索类似的实施方案。

These early adopters aren’t merely experimenting—they’re achieving remarkable results. Microsoft, for instance, reported a 9.4% increase in revenue per seller and 20% more closed deals using AI agents in their sales function. In the financial sector, JPMorgan deployed agents that reduced fraud by an impressive 70%. These success stories are encouraging more companies to explore similar implementations.

数据显示,企业成功部署智能代理的领域主要集中在五个方面。客户服务与支持领域遥遥领先,占部署总数的 35%。这些系统涵盖从自动查询解决到个性化服务交付的方方面面,企业报告称,平均问题解决时间缩短了 12% 到 30%。

The data reveals five primary categories where organizations are successfully deploying agents. Customer service and support lead the pack, accounting for 35% of implementations. These systems handle everything from automated query resolution to personalized service delivery, with companies reporting average resolution time improvements between 12% and 30%.

内部运营占比高达 25%,企业利用代理来处理文档、优化工作流程和完成行政任务。这方面的成果尤为显著,某些流程的时间节省幅度高达 30% 至 90%。例如,麦肯锡公司客户入职代理就实现了交付周期缩短 90%,行政工作量减少 30%。

Internal operations follow at 25%, with organizations using agents to handle document processing, workflow optimization, and administrative tasks. The results here are particularly striking, with time savings ranging from 30% to 90% for certain processes. McKinsey & Company’s client onboarding agent, for example, demonstrated a 90% reduction in lead time and 30% reduction in administrative work.

用例类别

Use Case Category

百分比

Percentage

主要优势报告

Key Benefits Reported

客户服务与支持

Customer Service & Support

35%

35%

分辨率速度提升 12-30%

12-30% faster resolution times

支持成本降低 20-40%。

20-40% reduction in support costs

更高的客户满意度评分

Higher customer satisfaction scores

内部运营

Internal Operations

25%

25%

处理时间缩短 30-90%

30-90% reduction in processing time

节省 25-50% 的成本

25-50% cost savings

降低错误率

Reduced error rates

销售与市场营销

Sales & Marketing

20%

20%

营收增长9-21%

9-21% revenue increase

成交量增加 20-30%

20-30% more deals closed

更高的转化率

Higher conversion rates

安全与欺诈检测

Security & Fraud Detection

12%

12%

欺诈减少70%

70% fraud reduction

更快的威胁检测

Faster threat detection

提高准确性

Improved accuracy

专业行业解决方案

Specialized Industry Solutions

8%

8%

行业特定改进

Industry-specific improvements

监管合规

Regulatory compliance

增强服务交付

Enhanced service delivery

表 1.2:根据我们的研究,AI 代理的使用案例和主要业务影响(来源:© Bornet 等人)

Table 1.2: AI agent use cases and key business impacts according to our research (Source: © Bornet et al.)

销售和营销应用占总实施量的 20%,主要侧重于线索筛选、营销活动优化和个性化推广。该类别中的公司报告的收入增长幅度在 9% 到 21% 之间。安全和欺诈检测应用占总实施量的 12%,而专业行业解决方案则占剩余的 8%。

Sales and marketing applications represent 20% of implementations, focusing on lead qualification, campaign optimization, and personalized outreach. Companies in this category report revenue increases ranging from 9% to 21%. Security and fraud detection accounts for 12% of implementations, while specialized industry solutions make up the remaining 8%.

衡量代理人的影响

Measuring Agent Impact

本研究中各组织报告的量化改进效果显著。在效率提升方面,各公司一致报告流程时间缩短30%至50%,成本节约25%至40%,错误率降低30%至60%。收入方面的影响同样令人瞩目,销售额增长9%至20%,转化率提升15%至25%,客户满意度提升20%至40%。

The quantifiable improvements reported by organizations in our study are substantial. In terms of efficiency gains, companies consistently report process time reductions between 30% and 50%, cost savings of 25% to 40%, and error reduction rates of 30% to 60%. Revenue impact is equally impressive, with sales increases ranging from 9% to 20%, conversion rate improvements of 15% to 25%, and customer satisfaction increases of 20% to 40%.

一项特别有趣的发现是,70%的成功案例都采用了人机混合工作流程。这些机构发现,在关键决策上保留人工监督,同时利用人工智能处理日常任务,不仅能提升绩效,还能提高员工满意度。这种方法能够实现持续改进,因为人类和人工智能代理可以互相学习。

A particularly interesting finding is that 70% of successful implementations use hybrid human-AI workflows. These organizations have found that maintaining human oversight for critical decisions while leveraging AI for routine tasks not only improves results but also increases employee satisfaction. This approach enables continuous improvement as humans and AI agents learn from each other.

成功蓝图:制胜代理商实施策略

Blueprint for Success: Winning Agent Implementation Strategies

我们的研究表明,成功的实施通常遵循稳健的战略方法。近三分之二的成功实施都始于试点项目,这使得组织能够在可控的环境中测试和完善代理功能,并通过已取得的成功来赢得内部支持。

Our research reveals that successful implementations typically follow a measured, strategic approach. Nearly two-thirds of successful implementations began with pilot programs, allowing organizations to test and refine agent capabilities in a controlled environment while building internal support through demonstrated success.

数据显示,成功的实施案例中始终存在三个关键的成功因素。首先是清晰定义用例。成功的组织会花时间确定他们想要解决的具体问题,并在部署前建立可衡量的成功指标。他们会严格限定范围,并定期进行绩效评估,以确保代理程序持续达成目标。

The data points to three critical success factors that consistently appear across successful implementations. First is the clear definition of use cases. Organizations that succeed take time to identify specific problems they want to solve and establish measurable success metrics before deployment. They maintain well-defined scope limitations and implement regular performance evaluations to ensure the agents continue to meet their objectives.

第二点是强大的变革管理能力。在这方面表现出色的公司会制定全面的员工培训计划,并在整个实施过程中保持清晰的沟通渠道。他们认识到,成功部署代理不仅取决于技术,也取决于人。

The second is strong change management. Companies that excel in this area develop comprehensive employee training programs and maintain clear communication channels throughout the implementation process. They recognize that successful agent deployment is as much about people as it is about technology.

第三点是强大的技术基础设施。成功的企业在部署之前,会确保已具备集成的数据架构、强大的安全措施和可扩展的系统。他们还会定期进行维护和更新,以确保系统始终以最佳状态运行。

The third is robust technical infrastructure. Successful organizations ensure they have integrated data architecture, strong security measures, and scalable systems in place before deployment. They also maintain regular maintenance and update schedules to keep their systems running optimally.

我们将在本书后面更深入地探讨这些关键成功因素,并详细介绍设计有效的 AI 代理的策略(第 8 章)以及如何成功地将它们整合到您的业务中(第 4 部分)。

We’ll dive deeper into these key success factors later in the book, with detailed strategies for designing effective AI agents (Chapter 8) and successfully integrating them into your business (Part 4).

领导者如何克服代理商采纳障碍

How Leaders Overcome Agent Adoption Hurdles

实施人工智能代理的组织面临着几个共同的挑战。系统集成是最常被提及的挑战,45% 的公司表示这是一个重大障碍。成功的组织通过分阶段集成方法和 API 优先架构,以及定期的测试和验证程序来应对这一挑战。

Organizations implementing AI agents face several common challenges. System integration is the most frequently cited, with 45% of companies reporting it as a significant hurdle. Successful organizations address this through phased integration approaches and API-first architecture, along with regular testing and validation procedures.

数据质量是另一项重大挑战,35% 的公司都提到了这一点。各组织通过结构化的数据治理举措和定期质量评估来克服这一挑战。技术专长位列第三,占比 20%。受访者将其列为关注点。领先企业通过全面的培训计划和战略性的外部合作来应对这一问题。

Data quality presents another major challenge, reported by 35% of companies. Organizations overcome this through structured data governance initiatives and regular quality assessments. Technical expertise rounds out the top three challenges, with 20% of respondents citing it as a concern. Leading organizations address this through comprehensive training programs and strategic external partnerships.

运营挑战主要集中在变革管理、绩效监控和成本管理方面。成功的组织通过清晰的沟通策略、明确的关键绩效指标 (KPI) 和定期的成本审查来应对这些挑战。他们还保持有效的反馈机制,以确保持续改进。

Operational challenges center around change management, performance monitoring, and cost management. Successful organizations tackle these through clear communication strategies, defined KPIs, and regular cost reviews. They also maintain strong feedback loops to ensure continuous improvement.

我们将在本书后面更深入地探讨这些关键挑战和解决方案,提供设计强大的 AI 代理的实用策略(第 8 章)以及将其有效实施到您的业务中(第 4 部分)。

We’ll take a deeper look at these critical challenges and solutions later in the book, offering practical strategies for designing powerful AI agents (Chapter 8) and effectively implementing them into your business (Part 4).

经纪人的未来发展方向以及如何做好准备

Where Agents Are Headed and How to Prepare

展望未来,人工智能代理的普及应用势头强劲,丝毫没有放缓的迹象。我们的研究表明,85% 的公司计划增加代理部署,其中十分之七的公司正在探索新的应用场景,60% 的公司正在增加人工智能预算。超过半数的公司正在开发定制化解决方案,以满足其特定需求。

Looking ahead, the trajectory of AI agent adoption shows no signs of slowing. Our research indicates that 85% of companies plan to increase their agent implementations, with seven out of ten exploring new use cases and 60% increasing their AI budgets. More than half are developing custom solutions tailored to their specific needs.

我们的研究为正在考虑或已经开始应用人工智能代理的企业领导者提出了几项关键建议。首先,战略性思考固然重要,但从小处着手至关重要。成功的企业会首先确定高影响力、低风险的应用案例,并在此基础上逐步扩展。

Our research suggests several key recommendations for business leaders considering or beginning their journey with AI agents. First, while thinking strategically is important, starting small is crucial. Successful organizations identify high-impact, low-risk use cases for their initial implementations and build on these successes incrementally.

基础设施投资应是优先事项,尤其要重视数据质量和安全措施。各组织需要确保其系统具有可扩展性,以适应未来的增长和新增的应用场景。

Infrastructure investment should be a priority, with particular attention paid to data quality and security measures. Organizations need to ensure their systems are scalable to accommodate future growth and additional use cases.

变革管理值得高度重视。成功的组织会制定全面的培训计划,并建立清晰的沟通渠道。他们致力于培育一种文化。在创新过程中,员工感到有能力与人工智能代理一起工作,而不是受到它们的威胁。

Change management deserves significant attention. Successful organizations develop comprehensive training programs and create clear communication channels. They work to foster a culture of innovation where employees feel empowered to work alongside AI agents rather than threatened by them.

最后,衡量和迭代至关重要。成功的组织从一开始就定义清晰的成功指标,并定期进行绩效评估。他们将人工智能代理的部署视为持续改进的循环,而不是一次性部署。

Finally, measurement and iteration are crucial. Successful organizations define clear success metrics from the start and maintain regular performance reviews. They view their AI agent implementations as continuous improvement cycles rather than one-time deployments.

未来之路:采用趋势告诉我们什么

The Road Ahead: What Adoption Trends Tell Us

我们的研究数据清晰地表明,人工智能代理正在各行各业创造显著价值。然而,成功需要周全的策略,将技术专长、强有力的变革管理以及清晰的业务目标相结合。遵循成功实施模式的组织——从试点项目入手、聚焦明确的应用场景并保持严格的人工监督——最有可能取得积极成果。

The data from our study clearly demonstrates that AI agents are delivering significant value across industries. However, success requires a thoughtful approach combining technical expertise, strong change management, and clear business objectives. Organizations that follow the patterns of successful implementations—starting with pilots, focusing on clear use cases, and maintaining strong human oversight—are most likely to achieve positive results.

展望未来,智能体部署的趋势丝毫没有放缓的迹象。企业领导者应该将其视为重新构想组织工作方式的契机,而非威胁。关键在于采取战略性部署策略,借鉴早期采用者的经验,并保持平衡的视角,认识到智能体可以增强和提升人类能力,而不是取代人类。

As we look ahead, the trend toward increased agent implementation shows no signs of slowing. Business leaders should view this not as a threat but as an opportunity to reimagine how work gets done in their organizations. The key is to approach implementation strategically, learn from early adopters, and maintain a balanced view of how agents can augment and enhance human capabilities rather than replace them.

本书后面将详细介绍这些实施的关键挑战和最佳实践,包括如何设计 AI 代理以产生最大影响(第 8 章)以及如何将它们无缝集成到您的业务运营中(第 4 部分)。

Later in the book, we’ll break down these implementation key challenges and best practices, covering how to design AI agents for maximum impact (Chapter 8) and integrate them seamlessly into your business operations (Part 4).

第二章

CHAPTER 2

人工智能代理的五个层次:从自动化到自主

THE FIVE LEVELS OF AI AGENTS: FROM AUTOMATION TO AUTONOMY

一个正如我们在第一章中所看到的,人工智能代理市场正在迅速增长,数百家供应商提供涵盖各种功能的解决方案。这种激增带来了一个挑战:我们如何理解这些不同的系统?我们如何区分简单的自动化工具和真正自主的代理?本章将介绍一个全面的框架,用于理解人工智能代理能力的演进——从基本的规则遵循到复杂的自主性——这将帮助您驾驭这一复杂的领域,并根据自身需求做出明智的决策。

As we saw in Chapter 1, the market for AI agents is growing rapidly, with hundreds of vendors offering solutions across a spectrum of capabilities. This proliferation creates a challenge: How do we make sense of these different systems? How do we distinguish between simple automation tools and truly autonomous agents? This chapter introduces a comprehensive framework for understanding the progression of AI agent capabilities—from basic rule-following to sophisticated autonomy—that will help you navigate this complex landscape and make informed decisions about which solutions are right for your needs.

人工智能代理能力解析

Breaking Down the AI Agent’s Capabilities

当我们最初开始在企业中部署人工智能代理时,我们注意到一个普遍现象。企业领导者经常在不真正了解这些数字化团队成员的能力范围的情况下就贸然投入部署,这让我们想起了在不了解新同事的技能、经验和工作方式的情况下就试图与他们合作——这必然会导致预期不符,错失良机。

When we first started implementing AI agents in organizations, we noticed a common pattern. Business leaders would often dive straight into deployment without truly understanding what these digital teammates could and couldn’t do. It reminded us of trying to work with a new colleague without first learning about their skills, experience, and working style—a recipe for misaligned expectations and missed opportunities.

为什么能力映射很重要

Why Capability Mapping Matters

在接纳新团队成员时,我们不会简单地把任务交给他们然后听天由命。我们会投入时间了解他们的能力,评估他们的优势和劣势,并学习如何与他们高效协作。通过面试、讨论和实际测试,我们不仅能发现他们的能力,还能了解他们的思维方式、解决问题的方法以及他们可能需要哪些支持或指导。

When integrating a new team member, we don’t simply hand them tasks and hope for the best. We invest time in understanding their capabilities, assessing their strengths and weaknesses, and learning how to work together effectively. Through interviews, discussions, and practical tests, we discover not just what they can do but how they think, how they approach problems, and where they might need support or guidance.

在与人工智能代理合作时,这种深思熟虑的方法至关重要。虽然这些数字同事能够以惊人的速度和规模处理信息,但它们也拥有自身独特的特性、局限性和“思维方式”。理解这些方面不仅仅意味着知道应该委派哪些任务,更重要的是建立有效的合作关系,从而最大限度地发挥人类和人工智能的潜力。

This same thoughtful approach is crucial when working with AI agents. While these digital colleagues can process information at incredible speed and scale, they also have their own unique characteristics, limitations, and ways of “thinking.” Understanding these aspects isn’t just about knowing what tasks to delegate—it’s about building effective partnerships that maximize the potential of both human and artificial intelligence.

SPAR框架:一种理解人工智能代理的自然方法

The SPAR Framework: A Natural Way to Understand AI Agents

为了更好地解释人工智能代理的功能,我们开发了名为SPAR框架的理论:感知(Sense)、计划(Plan)、行动(Act)和反思(Reflect)。我们特意选择了这个名称——就像格斗运动中的陪练伙伴一样,人工智能代理会不断地与环境互动并适应环境。

To help explain AI agent capabilities, we developed what we call the SPAR framework: Sense, Plan, Act, and Reflect. We chose this name deliberately—like a sparring partner in combat sports, an AI agent constantly interacts with and adapts to its environment.

图像

图 1.1:人类如何采取行动(来源:© Bornet 等人)55

Figure 1.1: How a Human Takes Action (Source: © Bornet et al.)55

这个框架反映了我们人类实现目标的方式。我们首先决定要做什么——比如做晚餐。接下来,我们收集信息,查看有哪些食材。然后,我们仔细考虑各种方案,选择最佳方法——比如做意大利面。一旦决定了,我们就计划步骤,比如烧水和准备酱汁。有了清晰的计划,我们就行动起来,开始烹饪。之后,我们评估结果,从经验中学习,并为下次做出调整——比如下次少放点盐——从而形成一个持续改进的反馈循环。

This framework mirrors how we humans achieve our own goals. We start by deciding what needs to be done—like cooking dinner. Next, we gather input, checking what ingredients are available. Then, we think through our options, choosing the best approach—perhaps making spaghetti. Once decided, we plan the steps, like boiling water and preparing the sauce. With a clear plan, we take action and cook the meal. Afterward, we evaluate the result, learn from the experience, and adjust for the future—perhaps using less salt next time—creating a continuous feedback loop for improvement.

当我们向企业领导者和专业人士解释人工智能代理时,我们常常会将其与自动驾驶汽车进行类比。这不仅仅是因为自动驾驶汽车令人着迷——它们实际上是人工智能代理实际应用的完美例证。让我们以此为视角,来探讨一下SPAR框架:感知(Sense)、计划(Plan)、行动(Act)和反思(Reflect)。该框架涵盖了人工智能代理的四个方面。定义人工智能体如何在环境中运行的基本能力。

When we explain AI agents to business leaders and professionals, we often find ourselves drawing parallels to autonomous vehicles. It’s not just because self-driving cars are fascinating—they’re actually perfect examples of AI agents in action. Through this lens, let’s explore the SPAR framework: Sense, Plan, Act, and Reflect. This framework captures the four fundamental capabilities that define how AI agents operate in their environments.

图像

图 1.2:人工智能代理如何采取行动:SPAR 框架(来源:© Bornet 等人)

Figure 1.2: How an AI agent takes action: The SPAR Framework (Source: © Bornet et al.)

感知:特工的眼睛和耳朵

Sensing: The Eyes and Ears of Agents

想象一下,你坐在自动驾驶汽车里,它穿梭在城市街道上。车上的摄像头、雷达系统和传感器不断收集周围环境的数据——监控着从附近车辆位置到交通信号灯和路况的一切信息。这与人工智能代理在数字环境中的运行方式惊人地相似。

Imagine sitting in a self-driving car as it navigates through city streets. The vehicle’s array of cameras, radar systems, and sensors are constantly gathering data about its surroundings—monitoring everything from the position of nearby vehicles to traffic signals and road conditions. This is remarkably similar to how AI agents operate in digital environments.

正如自动驾驶汽车需要全面了解其周围环境一样,人工智能代理也必须能够感知其数字工作空间。它们从多个来源收集数据,检测重要触发因素,并持续感知其运行环境。当你在自动驾驶汽车中输入目的地时,你就是在设定它的目标——就像你为人工智能代理设定目标一样。代理会维护我们称之为“短期上下文窗口”的东西,这与自动驾驶汽车的工作原理类似。车辆会实时跟踪路况和导航需求。

Just as a self-driving car needs to understand its environment comprehensively, AI agents must be able to perceive their digital workspace. They gather data from multiple sources, detect important triggers, and maintain awareness of their operating context. When you enter a destination into an autonomous vehicle, you’re setting its goal—just like when you assign an objective to an AI agent. The agent maintains what we call a “short-term context window,” similar to how a self-driving car keeps track of immediate road conditions and navigation requirements.

规划:制定路线

Planning: Charting the Course

自动驾驶汽车一旦确定了目的地,并不会盲目行驶。它会处理地图数据,考虑交通状况,并评估多条可能的路线。这种规划阶段与人工智能体的工作方式完美契合。它们不会直接执行任务——而是首先处理现有信息,从而做出明智的决策,以实现目标。

Once an autonomous vehicle knows where it needs to go, it doesn’t just start driving blindly. It processes map data, considers traffic patterns, and evaluates multiple possible routes. This planning phase perfectly mirrors how AI agents work. They don’t simply jump into execution—they first process available information to make informed decisions about how to achieve their objectives.

想想自动驾驶汽车是如何规划变道的。它不会立即转向;它会评估周围车辆的速度和位置,计算最佳变道时机,并确保安全完成操作。同样,人工智能体也会进行复杂的推理,制定逐步实现目标的计划。它们会评估各种方案,确定行动优先级,并协调资源,这与自动驾驶汽车协调其各个系统来执行复杂驾驶操作的方式非常相似。

Think about how a self-driving car plans a lane change. It doesn’t just swerve immediately; it evaluates the speed and position of surrounding vehicles, calculates the optimal moment to move, and ensures the maneuver can be completed safely. Similarly, AI agents engage in sophisticated reasoning to develop step-by-step plans for achieving their goals. They evaluate options, prioritize actions, and coordinate resources, much like how an autonomous vehicle coordinates its various systems to execute a complex driving maneuver.

行动:将计划付诸实施

Acting: Putting Plans into Motion

能够采取具体行动的能力,使自动驾驶汽车和人工智能体区别于简单的分析系统。例如,当自动驾驶汽车转弯时,它会协调多个系统——转向、加速、制动——并按精确的顺序执行。同样,人工智能体也会利用其可用工具在环境中执行各种操作,例如发送通信、更新系统或管理数字资源。

The ability to take concrete action sets both autonomous vehicles and AI agents apart from simple analytical systems. When a self-driving car executes a turn, it coordinates multiple systems—steering, acceleration, braking—in precise sequences. Similarly, AI agents use their available tools to carry out actions in their environment, whether that’s sending communications, updating systems, or managing digital resources.

特别有趣的是,这两个系统是如何实时监控自身行为的。就像自动驾驶汽车会根据路况不断调整方向和速度一样,人工智能代理也会主动监控自身行为的准确性和有效性。根据需要做出调整,以保持朝着目标前进的方向。

What’s particularly interesting is how both systems monitor their actions in real-time. Just as a self-driving car continuously adjusts its steering and speed based on road conditions, AI agents actively monitor their actions for accuracy and effectiveness, making adjustments as needed to stay on course toward their objectives.

当自动驾驶车辆出现故障时,通常会有远程人员接管并解决问题。<sup> 56</sup>同样,当人工智能代理采取行动时,也需要有清晰的路径供人类审查这些行动,并在必要时采取补救措施。

When something goes wrong in an autonomous vehicle, there is usually a remote human that can take over and resolve the problem.56 Similarly, when AI agents take action, there needs to be a clear path for humans to review those actions and take remedial steps when necessary.

反思:从经验中学习

Reflecting: Learning from Experience

自动驾驶汽车和人工智能代理最先进的功能或许在于它们能够从经验中学习和适应。当自动驾驶汽车遇到道路施工或交通拥堵时,它不仅能绕过眼前的状况,还能将这些信息整合到自身的知识库中,从而优化未来的行驶路线。

Perhaps the most sophisticated capability in both autonomous vehicles and AI agents is their ability to learn and adapt from experience. When a self-driving car encounters road construction or heavy traffic, it doesn’t just navigate through the immediate situation—it can incorporate this information into its knowledge base to improve future journeys.

这种反思能力使这两个系统都能随着时间的推移而不断改进。正如自动驾驶汽车能够学习最佳路线和驾驶模式一样,人工智能体可以评估自身的性能,分析结果,并根据最佳实践来改进方法。它们会构建一种我们称之为“操作记忆”的东西,这有助于它们在未来类似情况下更有效地发挥作用。

This reflective capability enables both systems to get better over time. Just as autonomous vehicles learn optimal routes and driving patterns, AI agents can evaluate their performance, analyze outcomes, and refine their approaches based on what works best. They build what we might call an “operational memory” that helps them perform more effectively in similar situations in the future.

整合的力量

The Power of Integration

自动驾驶汽车和人工智能代理之所以如此强大,在于这四项能力如何在一个持续循环中协同工作。每项能力都为其他能力提供信息并加以增强,从而创建一个统一的系统,能够以越来越精细的方式实现复杂的目标。汽车的传感器为其规划提供信息,它指导自身的行动,并从中汲取经验教训——所有这一切都始终专注于安全、高效的交通运输这一最终目标。

What makes both self-driving cars and AI agents so powerful is how these four capabilities work together in a continuous cycle. Each capability feeds into and enhances the others, creating a unified system that can pursue complex goals with increasing sophistication. The car’s sensors inform its planning, which guides its actions, which generate experiences that it learns from—all while maintaining focus on the ultimate objective of safe, efficient transportation.

这种集成方法代表着与传统自动化方式的根本性转变。自动驾驶车辆和人工智能代理不再遵循僵化的预设指令,而是积极主动地与环境互动,做出决策,采取行动,并从结果中学习。它们不仅仅是执行命令,而是以高度的自主性朝着目标努力,这使它们成为各自领域中真正的变革推动者。

This integrated approach represents a fundamental shift from traditional automation. Rather than following rigid, predetermined instructions, both autonomous vehicles and AI agents actively engage with their environments, make decisions, take actions, and learn from outcomes. They don’t just execute commands—they work toward objectives with a degree of independence that makes them true agents of change in their respective domains.

虽然SPAR框架有助于我们理解人工智能体能做什么,但它并不能告诉我们它们能做到什么程度。想想开车:仅仅知道车辆需要转向、加速、制动和导航并不能让你知道你驾驶的是一辆普通轿车还是一辆完全自动驾驶的汽车。同样,不同的人工智能体在感知、规划、行动和反思方面的复杂程度也可能大相径庭。

While the SPAR framework helps us understand what AI agents can do, it doesn’t tell us how well they can do it. Think about driving: just knowing that a vehicle needs steering, acceleration, braking, and navigation doesn’t tell you whether you’re dealing with a basic sedan or a fully autonomous vehicle. Similarly, different AI agents can have vastly different levels of sophistication in how they sense, plan, act, and reflect.

这种复杂性给企业带来了挑战。当供应商声称他们的解决方案使用“人工智能代理”时,这究竟意味着什么?企业领导者如何评估和比较不同的系统?他们如何知道自己实际需要的功能级别?

This complexity creates a challenge for organizations. When vendors claim their solutions use “AI agents,” what exactly does that mean? How can business leaders evaluate and compare different systems? How do they know what level of capability they actually need?

人工智能代理能力的复杂现实

The Complex Reality of AI Agents’ Capabilities

然而,通过我们的咨询工作和研究,我们发现业界对于“智能体”的真正定义尚未达成共识。为了解决这个问题,我们提倡采用一种演进框架——一种能够反映人工智能能力发展演变的框架。正如技术从简单发展到复杂一样,该框架提供了一种结构化的方法来评估和定义人工智能智能体不断演进的角色。

However, through our consulting work and research, we’ve found no industry consensus on what truly defines an “agent.” To address this, we advocate for a progression framework—one that reflects the evolutionary nature of AI capabilities. Just as technology evolves from simple to sophisticated, this framework provides a structured way to assess and define the advancing role of AI agents.

在当前的人工智能领域,将系统简单地二元分类为“智能体”或“非智能体”是有问题的。这种僵化的分类无法捕捉不同系统的细微差别,常常导致对系统潜力的不切实际的期望或低估,并且与人工智能在实际应用中渐进式发展的本质不符。

The binary classification of “agent” or “not agent” is problematic in the current AI landscape. Such rigid categorization fails to capture the nuanced capabilities of different systems, often leads to unrealistic expectations or underestimation of a system’s potential, and doesn’t align with the incremental nature of AI development in real-world applications.

我们也听到一些声音,将人工智能代理局限于最复杂的系统——那些能够行动、学习和适应的系统。我们认为这种观点既局限又具有误导性。这就像说,只有完全自动驾驶的汽车才算得上是汽车一样。事实上,进步是一个循序渐进的过程。如果我们采用狭隘的定义,就会错失当下正在涌现的机遇,而这些机遇正随着基础人工智能代理的出现而产生影响。问题不在于“它是否是终极代理?”,而在于“它现在能发挥多大的作用——以及未来会怎样?”让我们在发展的每个阶段都保持创新的大门。

We also hear voices limiting AI agents to only the most sophisticated systems—those that act, learn, and adapt. We think this is limiting and misleading. It is like saying a car isn’t a car unless it’s fully autonomous. The reality is that progress is built in stages. If we adopt a narrow definition, we miss the opportunities unfolding right now with foundational AI agents already driving impact. The question isn’t ‘Is it the ultimate agent?’ It’s ‘How effectively can it act today—and what’s next?’ Let’s keep the door open to innovation at every stage of the journey.

在与我们服务的公司探讨了各种框架之后,我们发现汽车行业提供了一个完美的类比,能够引起技术和业务利益相关者的共鸣。正如美国汽车工程师协会 (SAE) 将驾驶自动化分为六个级别,从 0 级(完全手动)到 5 级(在所有条件下完全自动驾驶),我们也可以将类似的演进路径应用于人工智能代理。

After exploring various frameworks with the companies we help, we found that the automotive industry offers a perfect analogy that resonates with both technical and business stakeholders. Just as the Society of Automotive Engineers (SAE) defines six levels of driving automation, from Level 0 (fully manual) to Level 5 (fully autonomous under all conditions), we can apply a similar progression path to AI agents.

如今,尽管特斯拉等汽车拥有令人印象深刻的性能,但我们主要仍处于L2或L3级自动驾驶水平<sup> 57 </sup>——自动化可以处理许多任务,但仍需要人类监督和偶尔干预。截至本书出版时(2025年3月),Waymo和Cruise仅在凤凰城、旧金山和洛杉矶等城市的有限“地理围栏”区域内测试L4级自动驾驶汽车,用于网约车服务<sup> 58</sup> 。

Today, despite the impressive capabilities of cars like Tesla, we’re mainly operating at Level 2 or 357--where automation handles many tasks but still requires human oversight and occasional intervention. At the time of printing the book (March 2025), Waymo and Cruise are just testing Level 4 autonomous vehicles in limited, “geo-fenced” areas for ride-hailing services in cities like Phoenix, San Francisco, and Los Angeles.58

智能体人工智能发展框架

The Agentic AI Progression Framework

人工智能代理也是如此。虽然我们经常将它们视为完全自主的系统,但实际上,它们的能力和独立性各不相同,并且沿着一条清晰的发展路径不断进步。

The same is true for AI agents. While we often talk about them as fully autonomous systems, in reality, we’re dealing with varying levels of capability and independence that progress along a clear developmental path.

在这一发展进程的早期阶段,我们拥有能够执行特定、预定义任务的人工智能代理,但这些代理需要大量的人工监督——就像配备基本驾驶辅助功能的汽车一样。随着我们沿着发展框架不断深入,我们发现代理能够处理更复杂的动作序列并做出一些独立决策,但在关键节点仍然需要人工验证——类似于当今最先进的商用车辆。在这一发展进程的尽头,是最高级别的代理,它们能够在任何领域内完全理解、规划和执行复杂的任务,且只需极少的人工干预。这些目前仍主要停留在理论层面——就像L5级自动驾驶汽车仍然是未来的目标一样。

At the early stages of this progression, we have AI agents that can execute specific, predefined tasks but require significant human oversight—like a car with basic driver assistance features. As we move further along the Progression Framework, we find agents that can handle more complex sequences of actions and make some independent decisions but still need human validation at critical points—similar to today’s most advanced commercial vehicles. At the far end of this progression lie the highest levels, where agents can fully understand, plan, and execute complex missions with minimal human input across any domain. These remain largely theoretical—just as Level 5 autonomous vehicles are still a future goal.

理解这些发展阶段并非仅仅是学术探讨。该框架有助于组织识别关于人工智能代理能力的夸大宣传,并就项目中的人工智能集成做出明智的决策。它能够促进技术团队和最终用户之间更有效的沟通,同时为人工智能战略发展提供清晰的路线图,帮助企业拨开迷雾,认清真相。

Understanding these progression levels isn’t just an academic exercise. This framework helps organizations detect overblown claims about AI agent capabilities and make informed decisions about AI integration in their projects. It enables more effective communication between technical teams and end-users while providing a clear roadmap for AI strategy development that cuts through the hype.

让我们来探讨一下这些发展阶段在实践中的具体表现,首先从0级——人工操作开始,在这个阶段,所有认知和执行任务都由人类完成,没有任何自动化辅助。接下来,我们将看到每个阶段是如何在前一个阶段的基础上逐步构建的,在增加新功能和自主程度的同时,仍然需要适当的人工监督。

Let’s explore what each of these progression levels looks like in practice, starting with Level 0—Manual Operations, where humans perform all cognitive and execution tasks without any automation assistance. From there, we’ll see how each level builds upon the previous one, adding new capabilities and degrees of autonomy while still requiring appropriate human oversight.

图像

表 1.3:智能体人工智能发展框架(来源:© Bornet 等人)

Table 1.3: The Agentic AI Progression Framework (Source: © Bornet et al.)

0 级 - 手动操作(仅限人工)

Level 0 - Manual Operations (Human-Only)

在0级,所有认知和执行任务都由人类独立完成,没有任何自动化辅助。所需的能力完全是人类独有的:逻辑思维、决策能力、实际操作能力以及从经验中学习的能力。“技术”指的是一些基本的数字工具,例如电子表格、电子邮件客户端和商业应用程序,但这些工具必须完全由人类操作。例如,客服代表手动回复每封电子邮件,财务分析师自行收集和分析数据并生成报告,或者人力资源人员手动处理员工文件。这一级别的低效和局限性——人为错误、疲劳、速度限制和扩展性挑战——推动了基础自动化技术的发展。

At level 0, humans perform all cognitive and execution tasks without any automation assistance. The capabilities required are purely human: logical thinking, decision-making, physical task execution, and learning from experience. The “technology” consists of basic digital tools like spreadsheets, email clients, and business applications, but humans must operate them entirely. Examples include customer service representatives manually responding to each email, financial analysts creating reports by gathering and analyzing data themselves, or Human Resources staff manually processing employee paperwork. The inefficiencies and limitations at this level—human error, fatigue, speed constraints, and scaling challenges—drove the development of basic automation.

一级 - 基于规则的自动化

Level 1 - Rule-Based Automation

就像配备基本巡航控制的汽车一样,这一阶段代表了我们迈向自动化的第一步。这些简单的“代理”遵循预设规则和固定工作流程——例如,基本脚本或RPA(机器人流程自动化)系统。它们可以处理数据录入或表单处理等重复性任务,但缺乏真正的智能和适应能力。它们需要完全的人工设置和监督,就像巡航控制系统只能保持设定的速度一样。

Like a car with basic cruise control, this level represents our first steps toward automation. These are simple “agents” that follow predetermined rules and fixed workflows—for example, basic scripts, or RPA (Robotic Process Automation) systems. They can handle repetitive tasks like data entry or form processing but have no real intelligence or adaptability. They require complete human setup and oversight, just as a cruise control system only maintains a set speed.

这些智能体掌握了重复性任务执行和简单工作流程跟踪的基本能力。它们主要依赖于屏幕抓取技术、基本流程记录和规则引擎。典型的RPA智能体可以处理发票,例如将Excel表格中的数据复制到会计系统;自动完成员工入职流程,例如将新员工信息填充到多个HR系统中;或者使用预定义模板管理日常电子邮件回复。其技术栈相对简单:基本工作流程自动化。工具、基础脚本编写和RPA(机器人流程自动化)系统。亚马逊Echo、谷歌Home和苹果HomePod所具备的简单功能也属于此类。

These agents master the basic capabilities of repetitive task execution and simple workflow following. They rely primarily on screen-scraping technology, basic process recording, and rules engines. A typical RPA agent might handle invoice processing by copying data from Excel spreadsheets into accounting systems, automating employee onboarding by populating the new hire information into multiple HR systems, or managing routine email responses using predefined templates. The technology stack is relatively straightforward: basic workflow automation tools, basic scripting, and RPA (Robotic Process Automation) systems. The simple skills possessed by Amazon Echo, Google Home, and Apple HomePod also fall into this category.

二级 - 智能自动化

Level 2 - Intelligent Automation

这一级别相当于能够同时控制速度和转向的高级驾驶辅助系统。这些系统将基础自动化与人工智能功能相结合,例如机器学习、自然语言处理和计算机视觉。它们可以处理非结构化数据、进行预测并执行需要认知能力的任务。然而,就像能够保持在车道内行驶但仍需人工监督的汽车一样,它们的运行仍然受到相当严格的参数限制,并且需要大量的人工监管。

This level is comparable to advanced driver assistance systems that can handle both speed and steering. These agents combine basic automation with AI capabilities like machine learning, natural language processing, and computer vision. They can process unstructured data, make predictions, and handle tasks requiring cognitive abilities. However, like a car that can stay in its lane but needs human supervision, they still operate within fairly rigid parameters and require significant human oversight.

人工智能赋予的各种认知能力极大地扩展了端到端流程自动化的潜力。常见的应用包括:客服人员能够回答常见问题、识别简单请求,并根据关键词将客户引导至合适的资源;文档处理系统能够从各种格式(例如 PDF 或图像)中提取信息并将其导入数据库;交易代理能够根据市场情况执行预先设定的金融交易。

The ability to automate end-to-end processes expands significantly thanks to AI enabling various cognitive capabilities. Common applications include customer service agents that can answer common questions, recognize simple requests, and direct customers to the right resources based on keywords. Other examples include document processing systems that can extract information from various formats, such as PDFs or images, and process them in a database, or trading agents that can execute pre-determined financial transactions based on market conditions.

这些智能体的技术基础融合了四项核心功能。视觉功能利用计算机视觉“看”并处理文档、图像和视觉信息。语言功能通过自然语言处理理解并生成人类交流内容。思考与学习功能运用机器学习模型分析数据、进行预测、对信息进行分类并优化决策。执行功能通过智能工作流工具和RPA协调所有操作,处理简单任务和复杂流程编排。有关这些功能的更多详细信息,请参阅我们的著作《智能自动化》。

The technical foundation of these agents combines four essential capabilities. The Vision capability uses computer vision to “see” and process documents, images, and visual information. The Language capability enables understanding and generating human communication through natural language processing. The Thinking & Learning capability employs machine learning models to analyze data, make predictions, classify information, and optimize decisions. The Execution capability coordinates all actions through intelligent workflow tools and RPA, handling both simple tasks and complex process orchestration. You can refer to our book “Intelligent Automation” for more details on these capabilities.

第三级 – 智​​能体工作流程

Level 3 – Agentic Workflows

这些智能体类似于能够在高速公路上自主行驶的车辆,但在复杂情况下需要人类接管。它们可以生成内容(文本、图像、视频),并具备一定的行动规划、推理和记忆能力。它们在预设的边界和专业领域内表现良好,能够适应环境的某些变化。然而,它们在处理微妙、复杂或新颖的情况以及复杂的决策时仍然面临挑战。

These agents are similar to vehicles that can navigate highways independently but require human takeover in complex situations. They can generate content (text, images, videos) and have some ability to plan their actions, reason, and memorize. They work well within predefined boundaries and expertise domains, adapting to some variations in their environment. However, they still struggle with nuanced, complex, or novel situations and complex decisions.

这些智能体掌握了情境决策和从反馈中学习的基本技能。例如,数字助理可以管理员工入职流程,包括发送文件、安排培训、回答常见问题以及标记异常请求以供人工审核。其他例子包括:交易智能体根据市场状况执行复杂的金融交易;以及内容创作智能体跨多个渠道制作和优化营销材料。

These agents master contextual decision-making and basic learning from feedback. For example, a digital assistant can manage employee onboarding by sending paperwork, scheduling training, answering common questions, and flagging unusual requests for human review. Other examples include trading agents that execute complex financial transactions based on market conditions and content creation agents that produce and optimize marketing materials across multiple channels.

这些智能体所采用的技术在前代技术的基础上,新增了三个关键组件。大型语言模型赋予了智能体规划、推理和内容生成能力。基础记忆和学习系统,特别是用于自适应行为的强化学习,进一步增强了这些能力。最后,这些智能体还整合了工具操作能力,主要集中于数字界面,但在物理世界交互方面仍存在局限性。

The technology powering these agents builds upon previous levels, adding three crucial components. Large language models enable planning, reasoning, and content generation capabilities. This is augmented by basic memory and learning systems, particularly reinforcement learning for adaptive behavior. Finally, these agents incorporate tools’ manipulation capabilities, primarily focused on digital interfaces, though still limited in physical world interactions.

第四级——半自主智能体系统

Level 4 – Semi-Autonomous Agentic Systems

就像自动驾驶汽车能在特定条件下(天气良好、地图清晰)完全自主运行一样,这些人工智能代理也能在限定的范围内独立工作。它们可以理解目标,将目标分解成步骤,从结果中学习,并调整策略——但仅限于其工作范围之内。他们具备专业技能,展现出高度的灵活性,能够处理复杂的任务,但他们的自主权仍然局限于特定领域。

Like self-driving cars that can operate fully autonomously in specific conditions (good weather, mapped areas), these AI agents work independently within defined domains. They can understand goals, break them down into steps, learn from outcomes, and adapt strategies—but only within their area of expertise. They demonstrate high flexibility and handle complex tasks, though their autonomy remains limited to specific domains.

这些智能体在其各自领域内精通目标分解、战略规划和实时自适应学习。它们运用先进的人工智能架构,包括递归自我改进、因果推理模型和多智能体协调系统。应用领域包括:能够设计和执行复杂科学实验的研究智能体;能够分析患者数据并推荐治疗方案的医疗诊断智能体;以及能够追踪用户偏好、提出预算建议并根据消费模式调整推荐方案的财务顾问智能体。

These agents master goal decomposition, strategic planning, and real-time adaptive learning within their domains. They utilize advanced AI architectures, including recursive self-improvement, causal reasoning models, and multi-agent coordination systems. Applications include research agents that can design and execute complex scientific experiments, medical diagnosis agents that analyze patient data and recommend treatment plans, and financial advisor agents that track user preferences, suggest budget plans, and adjust recommendations based on spending patterns.

这些智能体的技术架构在复杂性和功能性方面都取得了显著进步。先进的推理和规划系统构成了其认知核心,并与记忆和学习机制协同工作,从而实现实时适应。此外,它们还具备增强的工具操控能力,能够操作数字和物理界面,但仍局限于既定领域内。

The technological architecture of these agents represents a significant advancement in complexity and capability. Advanced reasoning and planning systems form the cognitive core, working in concert with memory and learning mechanisms that enable real-time adaptation. These are complemented by enhanced tools manipulation capabilities that extend to both digital and physical interfaces, though still within defined domains.

第五级——完全自主的智能体系统

Level 5 – Fully Autonomous Agentic Systems

这代表了理论上的巅峰——就像一辆完全自动驾驶的汽车,可以在任何条件下行驶。这些智能体将是真正的自主系统,能够理解任何目标,制定策略,从经验中学习,并适应跨领域的新情况。它们可以与其他系统无缝集成,构建自己的工作流程,并在保持与人类价值观和目标一致的同时,独立做出复杂的决策。

This represents the theoretical pinnacle—like a fully autonomous vehicle that can drive anywhere under any conditions. These agents would be truly autonomous systems capable of understanding any goal, developing strategies, learning from experience, and adapting to new situations across domains. They could seamlessly integrate with other systems, construct their own workflows, and make complex decisions independently while maintaining alignment with human values and objectives.

这些理论上的智能体将掌握通用问题解决、跨领域迁移学习和自主目标设定等技能。其应用可能包括:能够处理个人和职业领域任何任务的通用个人助理;能够自主管理整个运营的商务经理;以及能够在各个领域做出创新科学发现的研究智能体。

These theoretical agents would master general problem-solving, cross-domain transfer learning, and autonomous goal setting. Applications might include universal personal assistants handling any task across personal and professional domains, autonomous business managers running entire operations, and research agents making novel scientific discoveries across fields.

这些智能体的理论技术栈需要超越当前最先进水平的能力。一个完整的自主系统需要集成先进的记忆系统、复杂的学习机制以及跨领域的实时适应能力。这将需要尚未开发的超级人工智能框架,以及强大的安全协议和复杂的伦理推理系统,以确保其符合人类价值观。

The theoretical technology stack for these agents would require capabilities beyond the current state-of-the-art. A full-scale autonomous system would need to integrate advanced memory systems, sophisticated learning mechanisms, and real-time adaptability across any domain. This would demand yet-to-be-developed frameworks for artificial superintelligence, coupled with robust safety protocols and complex ethical reasoning systems to ensure alignment with human values.

人工智能代理的演进:我们目前所处的位置

Evolution of AI Agents: Where We Stand Today

这一演进的关键不仅在于个体能力的日益精进,更在于多种能力的整合与协调,最终形成连贯、目标明确的系统。正如L5级自动驾驶汽车需要无缝整合感知、预测、规划和控制功能一样,先进的人工智能体也需要多种人工智能技术和能力的平滑集成才能实现其目标。

Critical to this evolution is not just the increasing sophistication of individual capabilities but the integration and orchestration of multiple capabilities into coherent, purposeful systems. Just as a Level 5 autonomous vehicle needs to seamlessly combine perception, prediction, planning, and control, advanced AI agents require the smooth integration of multiple AI technologies and capabilities to achieve their goals.

目前,市面上大多数人工智能代理的运行级别为2级或3级,一些专业系统在特定领域可达到4级。与全自动驾驶汽车一样,5级代理仍然是未来的目标——它既带来了令人兴奋的可能性,也引发了关于控制、安全和人为监督的重要问题。

Currently, most AI agents on the market operate at Levels 2 or 3, with some specialized systems reaching Level 4 in narrow domains. Like fully autonomous vehicles, Level 5 agents remain a future goal—one that raises both exciting possibilities and important questions about control, safety, and human oversight.

这种演进不仅仅是提高自动化程度,更重要的是开发出能够不断理解上下文、从经验中学习并独立决策,同时又与人类意图和价值观保持一致的系统。每个层级都建立在前一个层级的能力之上,从而创造出越来越复杂的智能体,这些智能体能够以更高的自主性处理更复杂、更细致的任务。

This evolution isn’t just about increasing automation—it’s about developing systems that can increasingly understand context, learn from experience, and make independent decisions while remaining aligned with human intentions and values. Each level builds upon the capabilities of the previous ones, creating increasingly sophisticated agents that can handle more complex and nuanced tasks with greater autonomy.

渐进式自主的魔力:理解人工智能代理级别

The Magic of Progressive Autonomy: Understanding AI Agent Levels

当我们最初开始接触人工智能代理时,我们常常会被最先进的功能所吸引。这似乎合情合理——为什么不利用最先进的技术呢?但经验告诉我们一个基本真理,我们现在称之为人工智能代理的黄金法则:越简单越好。这不仅仅是技术上的极简主义,而是要在每个具体应用中,找到功能和控制之间的最佳平衡点。

When we first started working with AI agents, we often found ourselves drawn to the most advanced capabilities available. It seemed logical—why not leverage the most sophisticated technology possible? But experience has taught us a fundamental truth that we now call the Golden Rule of AI agents: the simpler, the better. This isn’t just about technological minimalism; it’s about finding the right balance between capability and control for each specific application.

自主性的演进:一个自然的过程

The Evolution of Autonomy: A Natural Progression

想想我们是如何教新员工处理电子邮件的。一开始,我们可能会提供详细的步骤说明:“打开邮件客户端,点击‘新建邮件’,输入收件人地址,然后写这段文字……”随着经验的积累,我们的指导会变得更加宽泛:“起草一份回复客户的咨询,保持我们一贯的专业语气。”最终,我们可能只会说:“负责处理与客户的沟通,并及时向他们汇报项目进展。”

Think about how we teach a new employee to handle email communications. At first, we might provide detailed, step-by-step instructions: “Open your email client, click ‘New Message,’ enter the recipient’s address, write this specific text...” As they gain experience, our instructions become broader: “Draft a response to this client inquiry, maintaining our usual professional tone.” Eventually, we might simply say, “Handle our client communications and keep them informed about project progress.”

这一发展过程同样适用于人工智能代理:

This same progression applies to AI agents:

在第 1 级和第 2 级,我们需要指定每个操作:“点击‘新建电子邮件’按钮,复制此文本,将其粘贴到此处,输入此地址,点击发送。”

At Levels 1 and 2, we need to specify every action: “Click the button ‘new email,’ copy this text, paste it here, enter this address, click send.”

到了第三级,我们可以说:“利用这些信息撰写并发送电子邮件,并根据收件人的实际情况进行调整。”

By Level 3, we can say, “Write and send an email using this information, adapting it to the recipient’s context.”

在第 4 级,我们可能会简单地指示:“处理所有客户沟通,确保他们充分了解情况。”

At Level 4, we might simply direct, “Handle all client communications to ensure they’re well-informed.”

而到了第五级——尽管这一级目前仍处于理论阶段——我们或许可以设定一个目标,比如“提升销售额”。通过客户满意度”,让代理商决定所有必要的行动和沟通。

And at Level 5—though this level remains theoretical—we could potentially just set a goal like “Grow sales through customer satisfaction” and let the agent determine all necessary actions and communications.

智能体人工智能发展框架:不仅仅是成熟度

The Agentic AI Progression Framework: More Than Just Maturity

这个框架并非传统的成熟度模型,并非级别越高越好。相反,您可以将其视为不同类型代理的目录,每种代理都适用于特定的需求和场景。这类似于现代汽车中的驾驶辅助技术。虽然在高速公路上完全自动驾驶在技术上是可行的,但许多驾驶员更喜欢基本巡航控制带来的可预测性和控制性。“最佳”级别完全取决于您的具体需求和情况。

This framework isn’t a traditional maturity model where higher levels are always better. Instead, think of it as a catalog of different agent types, each suited for specific needs and contexts. It’s similar to the driving assistance technologies in modern cars. While fully autonomous driving might be technically possible on highways, many drivers prefer the predictability and control of basic cruise control. The “best” level depends entirely on your specific needs and circumstances.

让我们来探讨随着我们在智能体人工智能发展框架中不断进步,关键方面是如何演变的:

Let’s explore how key aspects evolve as we move up the Agentic AI Progression Framework:

图像

图 1.3:智能体人工智能发展框架(来源:© Bornet 等人)

Figure 1.3: The Agentic AI Progression Framework (Source: © Bornet et al.)

自主与控制:微妙的平衡

Autonomy and Control: A Delicate Balance

随着层级的提升,自主性逐渐增强,而人类的直接控制则逐渐减弱。在第一层级,智能体如同程序编写精良的机器,严格按照指令运行,结果可预测。第二层级引入了基本的决策能力,但仍受限于既定的参数。第三层级标志着智能体能力的显著提升,它们能够理解上下文并据此调整策略。第四层级和第五层级则展现出日益复杂的自主行为,智能体能够设定自己的子目标,并制定原创策略来实现这些目标。

As we progress through the levels, autonomy increases while direct human control decreases. At Level 1, agents operate like well-programmed machines, following exact instructions with predictable outcomes. Level 2 introduces basic decision-making capabilities but still within rigid parameters. Level 3 represents a significant leap, with agents capable of understanding context and adapting their approach accordingly. Levels 4 and 5 introduce increasingly sophisticated autonomous behavior, with agents capable of setting their own sub-goals and developing original strategies to achieve them.

然而,这种自主性的增强伴随着直接控制的减少。这类似于养育孩子——随着孩子独立性的增强,你的直接控制自然会减少,取而代之的是指导和监督。这种权衡在医疗保健、金融或法律合规等敏感领域尤为重要,因为在这些领域,可预测性和问责制至关重要。

However, this increased autonomy comes with reduced direct control. It’s similar to raising a child—as they develop more independence, your direct control naturally decreases, replaced by guidance and oversight. This trade-off becomes particularly important in sensitive domains like healthcare, finance, or legal compliance, where predictability and accountability are crucial.

教学悖论:少即是多

The Instruction Paradox: Less is More

发展框架最引人入胜之处在于,随着能力的提升,指令也变得越来越简单。在较低层次,指令必须详尽明确,例如编写基础机器的程序。随着能力的提升,指令变得更加目标导向和抽象。这种转变反映了人类的发展过程——我们从给孩子提供循序渐进的指令,逐渐过渡到仅仅分享目标并信任他们的判断。

One of the most fascinating aspects of the Progression Framework is how instructions become simpler as capabilities become more sophisticated. At lower levels, instructions must be detailed and explicit, like programming a basic machine. As we move up, instructions become more goal-oriented and abstract. This shift mirrors human development—we move from giving children step-by-step instructions to simply sharing objectives and trusting their judgment.

实施与学习动态

Implementation and Learning Dynamics

与直觉相反,高级智能体部署速度可能更快,因为它们可以通过试错学习。然而,它们需要更复杂的监督和风险管理系统。低级智能体虽然需要更详细的初始编程,但行为更可预测,也更容易部署。监管。在为特定应用选择合适的级别时,这种权衡至关重要。

Counter-intuitively, higher-level agents might be quicker to deploy because they can learn through trial and error. However, they require more sophisticated oversight and risk management systems. Lower-level agents, while requiring more detailed initial programming, offer more predictable behavior and easier oversight. This trade-off becomes crucial when choosing the appropriate level for specific applications.

从简单入手的智慧

The Wisdom of Starting Simple

为什么我们提倡从更简单、更底层的智能体入手?想想学习乐器的过程。你不会一开始就挑战复杂的交响乐——你会从基本的音阶和简单的乐曲开始,逐步提升技能和理解力。这种方法能让你在应对更复杂的挑战之前,先掌握正确的技巧和理解。

Why do we advocate starting with simpler, lower-level agents? Consider learning to play a musical instrument. You don’t start with complex symphonies—you begin with basic scales and simple pieces, gradually building your skills and understanding. This approach allows you to develop proper techniques and understanding before tackling more complex challenges.

同样,从较低层级的代理入手可以让组织:

Similarly, starting with lower-level agents allows organizations to:

1.在受控环境中熟悉人工智能代理。

1. Build familiarity with AI agents in a controlled environment

2.建立适当的监督和治理机制

2. Develop proper oversight and governance mechanisms

3.了解不同能力水平的实际意义

3. Understand the practical implications of different capability levels

4.创建适当的护栏和控制系统

4. Create appropriate guardrails and control systems

5.建立组织信心和能力

5. Build organizational confidence and competence

选择合适的级别

Choosing the Right Level

成功实施的关键在于为每个具体应用选择合适的级别。以一家金融服务公司为例,该公司在部署人工智能代理时,可能会选择 1 级或 2 级代理来处理交易,因为交易处理对可预测性和审计追踪至关重要。然而,在客户服务方面,他们可能会部署 3 级代理,因为客户服务更注重适应性和情境感知能力,而非严格的控制。

The key to successful implementation lies in choosing the appropriate level for each specific application. Consider a financial services company implementing AI agents. They might choose Level 1 or 2 agents for transaction processing, where predictability and audit trails are crucial. However, they might implement Level 3 agents for customer service, where adaptability and context awareness are more valuable than strict control.

这种在选择适当自主级别方面的灵活性是该框架的关键优势。它允许组织根据自身特定需求、风险承受能力和监管要求来匹配代理的能力。其目标并非实现最大程度的自主性。自主性固然重要,但也要针对每个应用场景找到独立性和控制力之间的平衡。

This flexibility in choosing the appropriate level of autonomy is a key strength of the framework. It allows organizations to match agent capabilities to their specific needs, risk tolerance, and regulatory requirements. The goal isn’t to achieve maximum autonomy but to find the right balance between independence and control for each application.

展望未来:一种务实的方法

Moving Forward: A Practical Approach

在您开始使用人工智能代理时,请记住黄金法则:越简单越好。即使您的最终目标是实现更高级的自主系统,也请从较低级别的代理入手。利用这段时间积累知识,建立适当的控制机制,并培养成功使用更高级代理所需的组织能力。

As you begin your journey with AI agents, remember the Golden Rule: the simpler, the better. Start with lower-level agents, even if your ultimate goal is to implement more autonomous systems. Use this time to build understanding, establish proper controls, and develop the organizational capabilities needed for success with more advanced agents.

***

***

虽然我们的智能体人工智能发展框架提供了一种结构化的方法来评估人工智能体的能力,但仅仅了解各个层级是不够的。要真正理解这些数字智能体的独特之处,我们需要超越框架本身,探索定义它们的独特特征。在下一章中,我们将带您深入了解人工智能体的思维,审视它们卓越的能力和固有的局限性——这些洞见在您开始与这些新的数字同事合作时将至关重要。

While our Agentic AI Progression Framework provides a structured way to evaluate AI agents’ capabilities, understanding the levels alone isn’t enough. To truly grasp what makes these digital minds unique, we need to look beyond the framework and explore the distinctive characteristics that define them. In the next chapter, we’ll take you inside the mind of an AI agent, examining both their remarkable abilities and inherent limitations—insights that will prove crucial as you begin working with these new digital colleagues.

第三章

CHAPTER 3

走进人工智能代理的内心世界

INSIDE THE MIND OF AN AI AGENT

T我们在上一章探讨的智能体人工智能发展框架提供了一种结构化的方法来评估人工智能体的能力,但仅仅了解各个层级是不够的。要真正理解这些数字智能体的独特之处,我们需要超越框架本身,探究定义它们的独特特征。本章将带您深入了解人工智能体的思维,探索它们卓越的能力和固有的局限性——这些洞见在您开始与这些新的数字同事合作时将至关重要。

The Agentic AI Progression Framework we explored in the previous chapter provides a structured way to evaluate AI agents’ capabilities, but understanding the levels alone isn’t enough. To truly grasp what makes these digital minds unique, we need to look beyond the framework and examine the distinctive characteristics that define them. This chapter takes you inside the mind of an AI agent, exploring both their remarkable abilities and inherent limitations—insights that will prove crucial as you begin working with these new digital colleagues.

人工智能代理的关键特性

Key Specificities of AI Agents

我们在组织机构部署人工智能代理的经验中,观察到一些使其拥有独特强大功能的基本特征。让我们来探讨这些关键特征,了解人工智能代理如何区别于传统自动化工具,并阐明其变革潜力。

In our experience implementing AI agents across organizations, we’ve observed several fundamental characteristics that make them uniquely powerful. Let’s explore these defining traits that set AI agents apart from traditional automation tools and help explain their transformative potential.

数字员工,不仅仅是工具

Digital Workers, Not Just Tools

人工智能代理与传统软件工具之间的区别非常显著。传统自动化就像一条高效的装配线——固定、可预测,且仅限于特定任务。而人工智能代理则更像是一位技能娴熟的数字员工,能够独立思考、适应变化并处理复杂情况。正如人类客服代表可以处理从简单咨询到复杂问题解决的各种事务一样,人工智能代理可以管理端到端流程,做出决策,并根据上下文调整其方法。

The distinction between AI agents and traditional software tools is profound. Traditional automation is like having a highly efficient assembly line—fixed, predictable, and limited to specific tasks. An AI agent, on the other hand, functions more like a skilled digital employee who can think, adapt, and handle complex situations independently. Just as a human customer service representative might handle everything from simple inquiries to complex problem-solving, an AI agent can manage end-to-end processes, make decisions, and adjust its approach based on context.

与现有系统并行运行

Operating Alongside Existing Systems

人工智能代理最实用的优势之一在于它们能够与您现有的技术基础设施协同工作,而不是取而代之。您可以将它们视为数字员工,它们懂得如何高效地操作和使用您所有的不同系统。

One of the most practical advantages of AI agents is their ability to work with your existing technology infrastructure rather than replace it. Think of them as digital workers who know how to navigate and utilize all your different systems effectively.

这一特性尤为重要,因为企业通常会在企业系统(例如ERP系统、客户关系管理系统(CRM)、人力资源管理系统等)上投入巨资。人工智能代理可以与这些系统集成,从多个数据源提取数据,跨平台执行流程,并弥补自动化方面的不足。例如,财务人工智能代理可以同时与SAP(用于交易数据)、Salesforce(用于客户信息)和Excel(用于自定义报表)协同工作,协调这些系统以生成洞察并实现复杂流程的自动化。

This characteristic is particularly valuable because organizations have typically invested heavily in their enterprise systems—ERPs, customer relationship management systems (CRM), HR management systems, and more. AI agents can integrate with these systems, pulling data from multiple sources, executing processes across platforms, and filling automation gaps. For example, a financial AI agent might work simultaneously with SAP for transaction data, Salesforce for customer information, and Excel for custom reports, coordinating between these systems to generate insights and automate complex processes.

全天候运营的力量

The Power of 24/7 Operations

与需要休息和轮班工作的人类员工不同,人工智能代理需要持续运行。这不仅仅是工作时间更长——而是要保持持续的警惕和专注。响应速度达到了人类员工无法企及的水平。

Unlike human workers who need breaks and operate in shifts, AI agents maintain constant operation. This isn’t just about working longer hours—it’s about maintaining continuous vigilance and responsiveness at a level that would be impossible for human workers.

以银行业欺诈检测为例。人工分析师一次只能监控有限数量的交易,而且可能因疲劳而忽略一些细微的模式。而人工智能代理可以跨多个时区持续监控数百万笔交易,即时识别可疑模式并立即采取行动。这种持续运行在网络安全等领域尤为重要,因为威胁可能随时出现;在全球客户服务运营中,咨询也全天候涌入,因此这种持续运行至关重要。

Consider fraud detection in banking. A human analyst can only monitor a limited number of transactions and might miss subtle patterns due to fatigue. An AI agent can continuously monitor millions of transactions across multiple time zones, instantly identifying suspicious patterns and taking immediate action. This constant operation becomes particularly crucial in areas like cybersecurity, where threats can emerge at any moment, or in global customer service operations, where inquiries come in around the clock.

无限可扩展性

Infinite Scalability

AI代理的可扩展性从根本上改变了企业管理能力的方式。传统的增长方式需要招聘、培训和逐步构建能力。而借助AI代理,扩展速度更快,几乎不受限制。需要处理十倍的客户咨询?只需几分钟,而不是几个月,即可部署额外的代理实例。

The scalability of AI agents represents a fundamental shift in how organizations can manage capacity. Traditional growth requires hiring, training, and gradually building capabilities. With AI agents, scaling is instant and virtually unlimited. Need to handle ten times more customer inquiries? You can deploy additional agent instances in minutes, not months.

这种可扩展性不仅限于处理业务量,还包括适应新情况和学习新技能的能力。试想一下,能够立即将公司最优秀的员工复制到多个地点或部门,并保持一致的质量和绩效。这种能力在应对意料之外的需求高峰或开拓新市场时尤为重要。

This scalability extends beyond just handling volume—it includes the ability to adapt to new situations and learn new skills. Imagine being able to instantly replicate your best performer across multiple locations or departments, maintaining consistent quality and performance. This capability becomes particularly valuable during unexpected demand spikes or when entering new markets.

普遍适用性

Universal Applicability

人工智能代理最强大的优势之一在于其能够跨行业和业务职能高效运行。传统的人工智能解决方案通常需要针对每个行业进行大量定制,而人工智能代理则遵循可广泛应用于各个领域的通用原则。

One of the most powerful aspects of AI agents is their ability to function effectively across different industries and business functions. While traditional AI solutions often require extensive customization for each industry, AI agents operate with generalizable principles that apply broadly across sectors.

无论是为银行分析风险、为制造商优化供应链,还是为医疗机构处理患者咨询,他们的核心SPAR能力——感知、规划、行动和反思——始终保持一致。他们能够在保持基本运营原则的同时,获取行业和公司特定的知识,这使得他们拥有极强的适应能力。

Their core SPAR capabilities—sensing, planning, acting, and reflecting—remain consistent whether they’re analyzing risk for a bank, optimizing supply chains for a manufacturer, or handling patient inquiries for a healthcare provider. They can acquire industry and company-specific knowledge while maintaining their fundamental operating principles, making them incredibly versatile.

协作的力量

The Power of Collaboration

人工智能代理最精妙的特性或许在于其协作能力——既能与人类协作,也能与其他代理协作。它们必要时可以独立工作,但真正的优势体现在协作环境中。不妨将它们视为团队成员,能够无缝融入现有工作流程,为人类员工提供支持,而非取代他们。

Perhaps the most sophisticated characteristic of AI agents is their ability to collaborate—both with humans and other agents. They can work independently when needed, but their real power emerges in collaborative settings. Think of them as team players who can seamlessly integrate into existing workflows, supporting human workers rather than replacing them.

例如,在内容营销领域,一个人工智能代理可以生成初稿,另一个可以进行搜索引擎优化,第三个可以管理发布日程——所有这些代理都会与提供战略指导和质量控制的人类编辑协作。这创建了一种强大的混合工作流程,充分利用了人类和人工智能的优势。

In a content marketing context, for example, one AI agent might generate initial drafts, while another optimizes for SEO, and a third manages publication scheduling—all while collaborating with human editors who provide strategic direction and quality control. This creates a powerful hybrid workflow that leverages the strengths of both human and artificial intelligence.

这些特征为何重要

Why These Characteristics Matter

当智能体人工智能系统设计精良且部署到位时,其各项特性结合起来将创造出真正具有革命性的成果:一支灵活、可扩展的数字化劳动力队伍,能够提升任何组织的运营效率。以典型的客户服务流程为例:人工智能代理可以全天候处理日常咨询,在高峰期迅速扩展规模,跨多个部门和系统工作,并与人工客服协作处理复杂案例——所有这些都将持续学习和改进。

When agentic AI systems are well-designed and implemented, their characteristics combine to create something truly revolutionary: a flexible, scalable digital workforce that can enhance operations across any organization. Consider a typical business process like customer service: AI agents can handle routine inquiries 24/7, instantly scale during peak periods, work across multiple departments and systems, and collaborate with human agents on complex cases—all while continuously learning and improving.

对于那些希望有效实施人工智能代理的人来说,理解这些特征至关重要。它们有助于解释本文将探讨人工智能代理为何代表着如此重大的进步,以及它们为何有潜力彻底改变组织的运作方式。在接下来的章节中,我们将探讨如何有效地利用这些特性,确保您的组织能够最大限度地发挥人工智能代理的优势。

Understanding these characteristics is crucial for those looking to implement AI agents effectively. They help explain why AI agents represent such a significant advance and why they have the potential to transform how organizations operate. In the following chapters, we’ll explore how to leverage these characteristics effectively, ensuring your organization can make the most of what AI agents have to offer.

人工智能代理的固有局限性

Inherent Limitations of AI Agents

在探索人工智能代理的潜力时,了解其局限性至关重要。正如我们不会期望人类员工在所有方面都完美无缺一样,人工智能代理也存在其自身固有的局限性。让我们以清晰和诚实的态度审视这些局限性,以便有效地部署这些技术。

As we explore the potential of AI agents, it’s crucial to understand their limitations. Just as we wouldn’t expect a human employee to be perfect at everything, AI agents have their own set of inherent constraints. Let’s examine these limitations with the clarity and honesty needed to deploy these technologies effectively.

智能模拟

The Simulation of Intelligence

人工智能代理最根本的局限性在于其“智能”的本质。虽然它们可以处理海量信息并生成复杂的响应,但它们对世界的理解方式与人类截然不同。它们仅仅是根据过去的模式对未来进行预测。这就像一个技艺精湛的演员,他能完美地背诵台词,但却无法真正感受到所扮演角色的情感,也无法理解这种情感与特定情境的关联。

The most fundamental limitation of AI agents lies in the nature of their “intelligence.” While they can process vast amounts of information and generate sophisticated responses, they don’t understand the world in the way humans do. They’re simply making predictions based on past patterns about what comes next. Think of it like an incredibly skilled actor who can perfectly deliver lines but doesn’t actually feel the emotions they’re portraying or understand the relevance of the emotion to the specific context.

这一点在复杂的专业场景中尤为明显。法律人工智能代理或许能够精准地概括合同条款,但它无法真正理解塑造特定情况法律解释的根本正义或公平原则。同样,医疗保健人工智能代理可以分析症状并提出治疗建议,但它缺乏经验丰富的医生通过多年与患者的互动所培养的直觉理解力。

This becomes particularly evident in complex professional scenarios. A legal AI agent might expertly summarize a contract’s terms, but it won’t truly grasp the underlying principles of justice or equity that shape the legal interpretation of given situations. Similarly, a healthcare AI agent can analyze symptoms and suggest treatments, but it lacks the intuitive understanding that experienced doctors develop through years of patient interaction.

数据质量困境

The Data Quality Dilemma

人工智能代理从根本上依赖于输入数据的质量。这不仅仅是技术上的限制,更是一个影响其运行方方面面的根本制约因素。这就像使用地图导航一样:如果地图过时或不准确,即使是最好的导航员也会走错路。

AI agents are fundamentally dependent on the quality of their data inputs. This isn’t just a technical limitation—it’s a fundamental constraint that affects every aspect of their operation. Think of it like trying to navigate using a map: if the map is outdated or inaccurate, even the best navigator will make wrong turns.

这种依赖性在商业环境中尤为关键。例如,在分析财务数据时,人工智能代理无法独立验证其处理数据的准确性。如果逻辑逻辑模型(LLM)对提示做出响应,它将根据训练数据做出反应——无论这些数据是否正确。如果输入数据存在错误,这些错误必然会影响最终结论。与人类分析师可能会注意到某些数字“感觉不对劲”不同,人工智能代理会自信地处理错误数据,而不会发出任何警告。

This dependency becomes particularly critical in business contexts. When analyzing financial data, for instance, an AI agent can’t independently verify the accuracy of the numbers it’s processing. If an LLM responds to a prompt, it will react based on the data on which it was trained—whether it is correct or not. If there are errors in the input data, these will inevitably propagate to the conclusions. Unlike human analysts who might notice when numbers “don’t feel right,” AI agents will confidently process incorrect data without raising red flags.

常识差距

The Common-sense Gap

人工智能体最显著的局限性之一是缺乏常识推理能力。虽然它们能够处理复杂的计算并遵循复杂的规则,但它们常常忽略任何人类都能凭直觉理解的显而易见的现实世界限制。

One of the most striking limitations of AI agents is their lack of common-sense reasoning. While they can process complex calculations and follow sophisticated rules, they often miss obvious real-world constraints that any human would instinctively understand.

试想一下,如果一个日程安排人工智能代理建议在凌晨3点安排一场重要的客户会议,或者一个旅行规划代理在飓风期间推荐户外活动,这些错误并非仅仅是令人啼笑皆非——它们凸显了人工智能代理在缺乏显式编程的情况下理解上下文和现实世界限制的能力存在根本局限性。

Consider a scheduling AI agent suggesting a crucial client meeting at 3 AM or a travel planning agent recommending outdoor activities during a hurricane. These aren’t just amusing errors—they highlight a fundamental limitation in AI agents’ ability to understand context and real-world constraints without explicit programming.

创造力之谜

The Creativity Conundrum

虽然人工智能代理在优化和迭代方面可以发挥强大的作用,但它们在真正的创造力和创新方面却力不从心。它们擅长对现有模式进行重组,但很少能真正创造出新的东西。原创的想法。把他们看作是混音大师,而不是原创作曲家。

While AI agents can be powerful tools for optimization and iteration, they struggle with true creativity and innovation. They excel at remixing existing patterns but rarely generate truly original ideas. Think of them as master remixers rather than original composers.

在创意领域,这种局限性尤为明显。人工智能代理可以生成现有设计或写作风格的变体,但它无法创作出能够重塑行业或开创全新艺术运动的突破性作品。真正创新的火花——那种人类独有的建立意想不到的联系和想象全新可能性的能力——仍然遥不可及。

In creative fields, this limitation becomes particularly apparent. An AI agent can generate variations on existing designs or writing styles, but it won’t produce the kind of breakthrough creative work that reshapes industries or creates new artistic movements. The spark of true innovation—that uniquely human ability to make unexpected connections and imagine entirely new possibilities—remains beyond their reach.

幻觉问题

The Hallucination Problem

人工智能代理最令人担忧的局限性之一是它们容易“产生幻觉”——即高置信度地生成错误信息。这并非简单的错误,而是这些系统处理信息和生成响应方式的根本缺陷。

One of the most concerning limitations of AI agents is their tendency to “hallucinate”—generating false information with high confidence. This isn’t a simple error; it’s a fundamental limitation of how these systems process information and generate responses.

试想一下,如果一个医疗人工智能代理自信地推荐一种根本不存在的药物,或者一个法律人工智能代理引用虚构的案例法,这不仅仅是错误——它们是人工智能代理用听起来合情合理但完全捏造的信息来填补自身知识空白的例子。这种倾向使得人工监督至关重要,尤其是在高风险情况下。

Consider a medical AI agent confidently recommending a non-existent drug or a legal AI agent citing fictional case law. These aren’t just mistakes—they’re examples of AI agents filling in gaps in their knowledge with plausible-sounding but entirely fabricated information. This tendency makes human oversight crucial, particularly in high-stakes situations.

伦理与判断的差距

The Ethics and Judgment Gap

最后,或许也是最重要的一点,人工智能代理缺乏真正的伦理推理和判断能力。虽然可以通过规则和准则对其进行编程,但它们并不理解其决策更深层次的道德含义,也不理解通常需要考虑的复杂人为因素。

Finally, and perhaps most importantly, AI agents lack the capacity for true ethical reasoning and judgment. While they can be programmed with rules and guidelines, they don’t understand the deeper moral implications of their decisions or the complex human factors that often need to be considered.

这种局限性在医疗保健、刑事司法或招聘决策等领域尤为关键。人工智能代理可能会为了追求效率或统计模式而优化决策,却忽略了其决策对人类的影响。它甚至可能建议拒绝……纯粹基于数字的贷款,忽略了可能使例外情况变得合理的人为因素。

This limitation becomes particularly critical in areas like healthcare, criminal justice, or hiring decisions. An AI agent might optimize for efficiency or statistical patterns without considering the human impact of its decisions. It might recommend denying a loan based purely on numbers, missing the human context that could make an exception appropriate.

前进之路:在这些限制条件下开展工作

The Path Forward: Working Within These Limitations

了解这些局限性并非否定人工智能代理的潜力,而是为了更有效地利用它们。通过认识到人工智能代理的能力范围,我们可以设计出既能发挥其优势又能保持适当的人工监督和干预的系统。在实践中,这意味着,例如:

Understanding these limitations isn’t about dismissing the potential of AI agents—it’s about using them more effectively. By recognizing what AI agents can and cannot do, we can design systems that leverage their strengths while maintaining appropriate human oversight and intervention. In practice, this means, for example:

创建混合系统,其中人工智能代理处理数据密集型任务,而人类则提供监督和判断。

Creating hybrid systems where AI agents handle data-intensive tasks while humans provide oversight and judgment

建立强有力的验证方法来识别可能的幻觉或错误,例如让不同的代理人评估其同伴的输出。

Establishing strong validation methods to identify possible hallucinations or mistakes, such as having various agents assess the outputs of their peers

应用策略来提高代理输入数据的质量

Applying strategies to enhance the quality of agents’ input data

在需要伦理推理或复杂背景理解的决策中,保持人的参与

Maintaining human involvement in decisions that require ethical reasoning or complex contextual understanding

提高透明度,使人们能够理解和评估智能体如何做出决策和采取行动。

Developing transparency so that humans can understand and evaluate how agents make decisions and take action

在本书接下来的章节中,我们将探讨如何在这些限制条件下最大限度地发挥人工智能代理的优势,并探索切实可行的策略。

In the following parts of the book, we’ll explore practical strategies for working within these limitations while maximizing the benefits that AI agents can provide.

当一个智能体不足以满足需求时:多智能体系统的力量与实践

When One Is Not Enough: The Power and Practice of Multi-Agent Systems

在人工智能领域,我们常常想象一个强大的人工智能系统来解决复杂的问题。但有时,单个智能体——即使是人工智能——也远远不够。正如人类通过团队合作完成艰巨任务一样,我们发现,人工智能智能体团队的协作也能取得卓越的成果。欢迎来到迷人的多智能体系统(MAS)世界,在这里,人工智能之间的协作正在彻底改变我们解决复杂问题的方式。

In the world of artificial intelligence, we often imagine a single, powerful AI system tackling complex problems. But sometimes, one mind—even an artificial one—isn’t enough. Just as humans work in teams to accomplish difficult tasks, we’re discovering that teams of AI agents working together can achieve remarkable results. Welcome to the fascinating world of multi-agent systems (MAS), where collaboration between artificial minds is revolutionizing how we solve complex problems.

人工智能交响乐团

The Orchestra of Artificial Minds

可以将多智能体系统想象成一个交响乐团。每个乐手演奏不同的乐器,遵循各自的乐谱,但他们共同演奏出和谐的乐章。类似地,多智能体系统由多个自主软件智能体组成,每个智能体都有其独特的角色和能力,它们协同工作,共同完成单个智能体难以甚至无法独自完成的目标。

Think of a multi-agent system like an orchestra. Each musician plays a different instrument, following their own sheet music, yet together, they create a harmonious performance. Similarly, a multi-agent system consists of multiple autonomous software agents, each with its own role and capabilities, working in concert to achieve goals that would be difficult or impossible for any single agent to accomplish alone.

让我们分享一个真实案例,以便更好地阐释接下来的讨论。一年前,我们与一家大型金融服务公司合作,对其客户服务运营进行转型。传统的做法是创建一个庞大的人工智能系统来处理所有事务——从理解客户查询、访问账户信息到生成回复。而我们则实施了一个多代理系统,由不同的专业代理负责客户服务的不同方面。一个代理专注于自然语言理解,另一个负责检索相关的账户信息,还有一个负责撰写回复,还有一个负责确保符合金融法规。就像交响乐团中的乐手一样,每个代理都有自己的专长,但他们齐心协力,共同打造了流畅无缝的客户服务体验。

Let us share a real-world example that will help illustrate these concepts throughout our discussion. A year ago, we worked with a major financial services company to transform its customer service operations. The traditional approach would have been to create one massive AI system to handle everything—from understanding customer queries to accessing account information to generating responses. Instead, we implemented a multi-agent system where different specialized agents handled different aspects of customer service. One agent focused on natural language understanding, another on retrieving relevant account information, another on composing responses, and yet another on ensuring compliance with financial regulations. Just like musicians in an orchestra, each agent had its specialty, but together, they created a seamless customer service experience.

为什么需要多个代理?

Why Multiple Agents?

你可能会想:既然可以构建一个强大的代理,为什么还要费力创建多个代理呢?人工智能系统能包办一切吗?答案在于实践和理论两方面的考量。

You might wonder: Why go through the complexity of creating multiple agents when you could potentially build one powerful AI system to do everything? The answer lies in both practical and theoretical considerations.

首先,需要考虑复杂性管理。现代业务流程极其复杂,涉及众多环节和相互依赖关系。例如,从订单到收款这样的跨职能端到端流程可能包含数千个具体任务。将这些流程分解成更小、更易于管理的模块,由各个代理负责,可以使整个系统更易于管理和维护。以金融服务为例,为不同职能部门配备独立的代理意味着,当法规发生变化时,我们可以在不影响其他组件的情况下更新合规代理,或者在不影响账户信息检索系统的情况下改进语言理解代理。

First, there’s the matter of complexity management. Modern business processes are incredibly complex, with many moving parts and interdependencies. A cross-functional, end-to-end process like order-to-cash might involve thousands of specific tasks. Breaking down these processes into smaller, manageable pieces that individual agents can handle makes the overall system more manageable and maintainable. In our financial services example, having separate agents for different functions meant we could update the compliance agent when regulations changed without touching the other components or improve the language understanding agent without risking disruption to the account information retrieval system.

其次,专业化也具有优势。斯坦福大学人工智能实验室的研究表明,专业化的智能体在特定任务上的表现通常优于通用型系统。<sup> 59</sup>每个智能体都可以针对其特定角色进行优化,使用最适合其功能的算法和方法。例如,我们的语言理解智能体使用了先进的自然语言处理模型,而合规性智能体则采用了更易于审计和验证的基于规则的逻辑。

Second, there’s the benefit of specialization. Research from Stanford’s AI Lab has shown that specialized agents often perform better at specific tasks than generalist systems.59 Each agent can be optimized for its particular role, using the most appropriate algorithms and approaches for its specific function. Our language understanding agent, for instance, used advanced natural language processing models, while the compliance agent employed rule-based logic that was easier to audit and verify.

第三,多智能体系统具有更强的弹性。根据最近的研究,具有多个智能体的分布式系统通常比集中式系统更稳健。如果一个智能体发生故障或需要维护,其他智能体通常可以继续运行,甚至可以进行调整以弥补其功能缺口。60我们的账户信息检索代理需要偶尔维护,系统仍然可以理解客户查询并提供一般信息,性能会优雅地下降,而不是完全失效。

Third, multi-agent systems offer superior resilience. According to recent research, distributed systems with multiple agents are generally more robust than centralized systems. If one agent fails or needs maintenance, the others can often continue functioning or even adapt to cover the gap.60 When our account information retrieval agent needed occasional maintenance, the system could still understand customer queries and provide general information, degrading gracefully rather than failing completely.

组织数字交响乐团:多智能体团队模型

Organizing the Digital Orchestra: Models for Multi-Agent Teams

正如人类组织可以采用不同的结构——从扁平化层级到传统的金字塔结构——多智能体系统也可以遵循不同的组织模型。模型的选择会显著影响智能体之间的交互方式以及系统整体的运行效率。

Just as human organizations can be structured in different ways—from flat hierarchies to traditional pyramids—multi-agent systems can follow different organizational models. The choice of model significantly impacts how agents interact and how effectively the system operates as a whole.

单一代理人,单一工具原则

The One Agent, One Tool Principle

近年来,我们最重要的发现之一是代理设计中简洁性的力量。我们发现,最可靠的方法其实非常简单:为每个特定工具分配一个代理。当一个项目需要多个工具时(这种情况越来越普遍),我们会为每个工具创建一个专用代理,外加一个协调代理来协调它们之间的交互。

One of our most important discoveries in recent years has been the power of simplicity in agent design. We’ve found that the most reliable approach is remarkably straightforward: assign one agent to handle one specific tool. When a project requires multiple tools—which is increasingly common—we create one dedicated agent per tool plus a coordinator agent to orchestrate their interactions.

“一人一工具”原则已成为我们的标准做法,因为它始终能带来成效。如果您需要使用三种不同的工具,请创建三个专业代理人和一个协调员。每个代理人都将成为其特定工具的专家,负责处理该特定界面的所有细微差别和特殊情况。协调员则负责管理这些专家之间的工作流程。

This “one agent, one tool” principle has become our standard practice because it consistently delivers results. If you need to work with three different tools, create three specialized agents and one coordinator. Each agent becomes an expert in its specific tool, handling all the nuances and edge cases of that particular interface. The coordinator agent then manages the flow of work between these specialists.

在我们的金融服务实施过程中,这一原则被证明至关重要。我们没有创建需要同时处理多个系统的复杂代理,而是为每个后端系统配备了专用代理:一个用于客户数据库,一个用于交易处理系统,一个用于文档管理系统。一个协调代理负责管理整个工作流程。这种方法显著提高了系统的可管理性。每个代理的指令都更加清晰明确,代理之间的交互也更容易处理和调试。最重要的是,每个代理都能真正精通其领域——专注于做好一件事。

In our financial services implementation, this principle proved invaluable. Instead of creating complex agents that juggled multiple systems, we had dedicated agents for each backend system: one for the customer database, another for the transaction processing system, and another for the document management system. A coordinator agent managed the workflow between them. This approach made the system significantly more manageable. The instructions for each agent were clearer and more focused. The interactions between agents were easier to handle and debug. Most importantly, each agent could truly master its domain—doing one thing but doing it right.

层级模型

The Hierarchical Model

从更广泛的组织架构角度来看,我们采用了一种称为层级式组织模型的模式,事实证明这种模式非常符合我们的需求。想象一下这样的公司结构:在顶层,我们设有一名协调员,负责管理客户互动的整体流程。在其下方,我们有部门级员工,他们分别专注于客户服务的不同方面;再往下,则是负责具体任务的员工。

From a broader organizational point of view, we used what’s called a hierarchical organization model, which proved particularly effective for our needs. Picture a corporate structure: at the top, we had a coordinator agent who managed the overall flow of customer interactions. Below it, we had department-level agents specializing in different aspects of customer service, and below those were individual worker agents handling specific tasks.

该模型在集中式和分散式方法之间取得了平衡。代理以层级形式组织,上层代理协调下层代理的活动。这种结构兼顾了局部自主性和全局协调性,能够很好地满足许多复杂的业务应用需求。

This model strikes a balance between centralized and decentralized approaches. Agents are organized in layers, with higher-level agents coordinating the activities of those below them. This provides a mix of local autonomy and global coordination that works well for many complex business applications.

然而,这种层级式方法并非组织多智能体系统的唯一途径。让我们来探讨研究人员和实践者认为有效的三种主要组织模型——请注意,我们并未尝试过这些模型;这仅供参考。

However, this hierarchical approach isn’t the only way to organize multi-agent systems. Let’s explore three main organizational models that researchers and practitioners have found effective—note that we have not tried them; this is just for your reference.

集中控制

Centralized Control

在这个模型中,一个主体——统筹者——扮演着指挥者的角色,如同我们这个比喻中的交响乐团的指挥家,负责指挥和协调所有其他主体。一些实践者认为,这个核心主体维护着系统状态和目标的全局视图,并做出其他主体执行的高层决策。虽然这可以确保系统的高度一致性和清晰的决策,但也可能造成瓶颈和单点故障——试想一下……如果指挥在交响乐演出期间突然消失。61

In this model, one agent—the orchestrator—acts as the conductor of our metaphorical orchestra, directing and coordinating all other agents. According to some practitioners, this central agent maintains a global view of the system’s state and goals, making high-level decisions that other agents execute. While this can provide strong consistency and clear decision-making, it can also create a bottleneck and single point of failure—imagine if the conductor suddenly disappeared during a symphony performance.61

去中心化协作

Decentralized Collaboration

相反,我们也有完全去中心化的系统,其中所有主体都是对等的,通过直接通信进行协调,无需任何中央权威机构。可以把它想象成一个爵士乐团,乐手们实时互动,无需指挥。一些实践者认为,这种方法具有出色的可扩展性和弹性,但需要复杂的协调协议来确保主体有效协作。<sup> 62</sup>而且,就像爵士乐演出一样,结果有时难以预测。

Conversely, we have fully decentralized systems where all agents are peers, coordinating through direct communication without any central authority. Think of it like a jazz ensemble where musicians respond to each other in real-time without a conductor. According to some practitioners, this approach offers excellent scalability and resilience but requires sophisticated coordination protocols to ensure agents work together effectively.62And, as in jazz performances, the outcome can sometimes be unpredictable.

使其奏效:关键成功因素

Making It Work: Critical Success Factors

凭借我们的经验和在该领域广泛的研究,我们已经确定了决定多智能体系统实施成功与否的几个关键因素。

Through our experience and backed by extensive research in the field, we’ve identified several critical factors that determine the success of a multi-agent system implementation.

清晰的沟通协议

Clear Communication Protocols

就像音乐家需要对乐谱和节奏有共同的理解一样,代理人也需要明确的协议来共享信息和协调行动。

Just as musicians need a common understanding of musical notation and timing, agents need well-defined protocols for sharing information and coordinating actions.

在我们的金融服务系统中,系统在第一天就彻底崩溃了,因为尽管各个客服人员的设计都很出色,但他们却无法有效地共享信息——这真是一次令人难忘的失败!这些客服人员就像一群才华横溢的专家,却都说着不同的语言。

In our financial services system, the system crashed spectacularly on day one because the agents, though individually well-designed, couldn’t effectively share information—a particularly memorable failure! The agents were like a team of brilliant experts who all spoke different languages.

从这次经历中,我们学会了在设计稳健的通信协议方面投入大量资源。这不仅仅是……定义消息格式——代理需要共享上下文、明确的处理误解的协议,以及验证彼此是否正确理解对方信息的方法。我们现在始终包含所谓的“翻译代理”,它们的唯一职责是确保系统不同部分之间的顺畅沟通。

From this experience, we learned to invest heavily in designing robust communication protocols. It’s not just about defining message formats—agents need shared context, clear protocols for handling miscommunication, and ways to verify they’ve understood each other correctly. We now always include what we call “translator agents,” whose sole job is to ensure smooth communication between different parts of the system.

在我们的金融服务系统中,我们实施了一套复杂的消息传递协议,使客服人员能够以结构化、可靠的方式共享客户互动信息、账户状态和响应计划。多篇相关研究论文提出了相应的协议和详细的实施方法。63

In our financial services system, we implemented a sophisticated message-passing protocol that allowed agents to share information about customer interactions, account status, and response plans in a structured, reliable way. Several research papers on the topic suggest protocols and detailed approaches for implementation.63

有效的协调机制

Effective Coordination Mechanisms

仅仅沟通是不够的——关键在于如何运用沟通。智能体经常需要协调行动,尤其是在任务重叠或共享资源的情况下。如果没有协调,可能会出现两架送货无人机同时向同一地址送货,而忽略另一个地址的情况——一片混乱!因此,实施协调策略,使智能体能够协同工作,是成功的关键因素。协调策略的范围很广,从简单的调度规则(例如:智能体 A 先行动,然后是智能体 B)到复杂的算法(例如:多智能体规划或协商协议)。

Communication alone isn’t enough—it’s what you do with it that counts. Agents will frequently need to coordinate their actions, especially when their tasks intersect or they share resources. Without coordination, you might get two delivery drone agents trying to deliver to the same address while another address is ignored—chaos! Thus, a critical success factor is implementing coordination strategies so agents can work in harmony. This can range from simple approaches (like scheduling rules: Agent A goes first, then Agent B) to complex algorithms (like multi-agent planning or negotiation protocols).

关键在于预测多智能体系统中可能出现的冲突或重叠,并为智能体配备公平高效的解决方法。在实践中,我们经常模拟最坏情况(例如多个智能体争夺单一资源或智能体目标冲突)来测试协调逻辑。测试在此至关重要。当您发现协调问题时,需要采取措施来解决问题。如果出现问题,请修改代理人的指令并重试,直到在所有情况下都能正常工作。

The key is to anticipate where conflicts or overlaps might occur in your MAS and equip the agents with a way to resolve them fairly and efficiently. In practice, we often simulate worst-case scenarios (many agents converging on a single resource or agents with conflicting goals) to test our coordination logic. This is where testing is paramount. When you identify a coordination issue, you amend the agents’ instructions and try again until it works in all cases.

鲁棒性和错误恢复

Robustness and Error Recovery

事情总会出错——智能体可能会崩溃,网络可能会故障,错误的输入或软件漏洞也可能导致意想不到的行为。一个稳健的多智能体系统(MAS)必须在这些故障发生后仍能继续运行。这首先要消除单点故障;完全集中式的MAS架构非常脆弱,因此关键智能体(例如中央协调器)应该具备故障转移机制。例如,在一个项目中,我们实现了一个备用协调器,当主协调器在现场演示中发生故障时,备用协调器能够接管运行——从而避免了灾难性的后果。

Things will go wrong—agents may crash, networks may fail, and unexpected behaviors can emerge from bad input or software bugs. A robust multi-agent system (MAS) must continue functioning despite these failures. This starts with eliminating single points of failure; fully centralized MAS architectures are brittle, so critical agents, like central coordinators, should have failover mechanisms. For example, in one project, we implemented a backup coordinator that took over when the primary failed during a live demo—saving us from disaster.

最后,为防止级联故障,代理应使用健全性检查或信任模型来验证信息。如果某个代理开始出现异常行为,其他代理应能够标记或忽略不可靠的数据,或将其标记为需要人工审核,从而确保系统保持稳定有效。

Finally, to prevent cascading failures, agents should validate information using sanity checks or trust models. If one agent starts behaving erratically, others should be able to flag or ignore unreliable data or flag it for human review, ensuring the system remains stable and effective.

MAS中的涌现行为

Emergent Behaviors in MAS

让我们进一步概述多智能体系统(MAS),并了解一下这个激动人心的领域近期的发展。有趣的是,智能体群体有时会展现出涌现行为——即并非显式编程的复杂策略或解决方案。当智能体协作甚至仅仅是共存时,它们就能形成一种集体智能。例如,在一个多智能体共同学习资源收集博弈的实验中,这些智能体开始隐式地协调,在没有中央控制器的情况下发展出群体行为——本质上是一种涌现的团队合作。<sup> 64</sup>

Let us push our overview of MAS further by having a glimpse at recent developments in this exciting field. Interestingly, groups of agents can sometimes exhibit emergent behaviors—complex strategies or solutions that weren’t explicitly programmed. When agents collaborate or even just coexist, they can form a kind of collective intelligence. For example, in one experiment with many agents learning together in a resource-gathering game, the agents began coordinating implicitly, developing group behaviors without a central controller—essentially an emergent teamwork.64

合作可以让智能体共同解决单个智能体无法解决的问题。我们甚至已经看到了人工智能智能体与人类合作的早期案例。Meta AI 的 CICERO 智能体在棋盘游戏“外交”中达到了人类水平。“外交”要求玩家进行谈判、结盟,有时甚至背叛彼此。CICERO 将自然语言(用于在游戏中与人类玩家交流)与战略推理相结合,制定计划。在一个由人类玩家参与的在线“外交”联赛中,CICERO 成功跻身前 10%,甚至与一些并未意识到它是人工智能的人类玩家结盟。这表明人工智能智能体能够参与复杂的社会协作——在群体环境中进行谈判、说服和协调行动。<sup> 65</sup>

Cooperation can allow agents to tackle problems together that are too hard for one agent. We even see early examples of AI agents teaming up with humans. Meta AI’s CICERO is an agent that achieved human-level performance in the board game Diplomacy, which requires players to negotiate, form alliances, and sometimes betray each other. CICERO combined natural language (to talk with human players in-game) with strategic reasoning to make plans. In an online Diplomacy league with human players, CICERO managed to rank in the top 10% of players, even forming alliances with humans who didn’t realize it was an AI. This demonstrates that AI agents can engage in complex social collaboration—negotiating, persuading, and coordinating actions in a group setting.65

多智能体交互开辟了新的可能性:试想一下,一支由不同领域的医疗诊断人工智能组成的团队,彼此协作,共同得出全面的患者诊断;又或者,一群自主无人机相互通信,高效地覆盖和监控大片区域。然而,这也带来了新的挑战,例如如何确保智能体之间有效沟通并与人类目标保持一致。合作人工智能领域的研究人员正在积极探索如何设计出能够在智能体和人类混合群体中可靠协作、做出公平决策,甚至理解人类规范的智能体。

Multi-agent interactions open up new possibilities: imagine a team of medical diagnostic AIs, each specialized in a different field, consulting with each other to come to a comprehensive patient diagnosis, or fleets of autonomous drones that communicate to efficiently cover and monitor a large area. However, they also introduce new challenges, like how to ensure the agents communicate effectively and align with human goals. Researchers in cooperative AI are actively exploring how to design agents that can cooperate reliably, make fair decisions, and even understand human norms when in a mixed group of agents and people.

多智能体对话实验

Experimenting with Multi-Agent Conversations

为了真正理解人工智能体如何交互,我们建议做一个简单的实验。这个实验展示了两个人工智能体如何沟通、协商,甚至相互挑战——从而揭示涌现的对话行为。

To truly grasp how AI agents interact, we suggest a simple experiment. This exercise demonstrates how two AI agents might communicate, negotiate, or even challenge each other—uncovering emergent dialogue behaviors.

要进行这项实验,请同时打开两个 AI 聊天程序,例如 ChatGPT 或 Claude(例如,打开两个浏览器窗口,每个窗口显示一个聊天机器人)。并排运行这两个 AI 聊天程序,可以模拟不同角色和目标的代理之间进行的结构化对话。

To conduct this experiment, open two instances of an AI chat, such as ChatGPT or Claude (for example, open two browser windows, each displaying a chatbot). Running these two AI chat instances side by side allows you to simulate a structured conversation between agents with different roles and goals.

首先,指派代理人 A 担任财务顾问,试图说服持怀疑态度的客户代理人 B 采用储蓄计划。由于人工智能无法直接沟通,您将扮演信使的角色,来回传递回复(通过复制粘贴)。随着对话的进行,观察代理人的互动方式——谈判、反驳,甚至说服。代理人 A 是运用逻辑、安抚还是说服技巧?代理人 B 是仍然心存疑虑,还是最终妥协?

To start, assign Agent A as a financial advisor trying to convince Agent B, a skeptical client, to adopt a savings plan. Since the AIs can’t directly communicate, you’ll act as the messenger, relaying responses back and forth (copying and pasting them). As the conversation unfolds, watch how the agents engage—negotiating, counter-arguing, or even persuading. Does Agent A use logic, reassurance, or persuasion tactics? Does Agent B remain doubtful or eventually concede?

这种来回交流让我们得以一窥多智能体系统,其中人工智能智能体可以协作、辩论或共同决策。虽然这种交互是模拟的,但它突显了措辞和语境如何影响人工智能的行为——这是设计自主人工智能智能体时必须考虑的关键因素。

This back-and-forth exchange offers a glimpse into multi-agent systems, where AI agents collaborate, debate, or make joint decisions. While the interaction is a simulation, it highlights how phrasing and context shape AI behavior—an essential consideration in designing autonomous AI agents.

多智能体系统或将很快成为常态

Multi-Agent Systems Might Soon Become the Norm

展望未来,多智能体系统在人工智能驱动的世界中正变得日益重要。来自顶尖机构的研究表明,随着问题变得更加复杂和分散,多智能体方法将变得更加关键。关键在于理解何时以及如何有效地应用这种强大的范式。66

As we look ahead, multi-agent systems are becoming increasingly important in our AI-driven world. Research from leading institutions suggests that as problems become more complex and distributed, the multi-agent approach will become even more crucial. The key is understanding when and how to apply this powerful paradigm effectively.66

我们的金融服务实施方案不断发展演进,新增代理机构以应对社交媒体等新兴渠道以及预测服务等新功能。建议。该系统的模块化、多智能体架构已被证明能够很好地适应这些不断变化的需求。

Our financial services implementation continues to evolve, with new agents being added to handle emerging channels like social media and new capabilities like predictive service recommendations. The system’s modular, multi-agent architecture has proven remarkably adaptable to these changing requirements.

未来或许不属于单一的、庞然大物般的AI系统,而属于由众多专业智能体组成的复杂团队,它们协同合作,和谐共事。智能体生态系统可能会发展壮大,有时甚至会跨越组织边界。随着我们不断拓展人工智能的边界,人工智能智能体的协作将在解决我们面临的最复杂挑战中发挥日益重要的作用。

The future may well belong not to single, monolithic AI systems, but to sophisticated teams of specialized agents working together in harmony. Agent ecosystems are likely to develop, sometimes crossing organizational boundaries. As we continue to push the boundaries of what’s possible with artificial intelligence, the orchestra of AI minds will play an increasingly important role in solving our most complex challenges.

代理人的困境:如何在创造力和可靠性之间取得平衡

The Agent’s Dilemma: Balancing Creativity with Reliability

在我们为一家金融服务公司实施人工智能代理的早期阶段,我们遇到了一个情况,它完美地诠释了人工智能代理领域最引人入胜的挑战之一。该公司希望使用基于逻辑逻辑模型(LLM)的人工智能代理来实现应付账款流程的自动化。最初的成果令人印象深刻——代理能够理解复杂的发票,将其与采购订单匹配,甚至能够以惊人的技巧处理异常情况。然而,有一天,代理决定“优化”付款计划,创建了一个它认为更高效的付款方案。虽然这个方案很有创意,但这并非会计部门的预期。

Early in our journey of implementing AI agents for a financial services company, we encountered a situation that perfectly crystallized one of the most fascinating challenges in the field of AI agents. The company wanted to automate its accounts payable process using LLMs-powered AI agents. The initial results were impressive—the agent could understand complex invoices, match them with purchase orders, and even handle exceptions with surprising sophistication. However, one day, the agent decided to “optimize” the payment schedule by creating what it considered a more efficient payment plan. While creative, this wasn’t what the accounting department had in mind.

理解代理人的困境

Understanding the Agent’s Dilemma

我们把这种现象称为“智能体困境”——正是这些赋予智能体强大能力的创造性逻辑推理(LLM)能力——例如推理、理解上下文和提前规划——也使得它们在某些方面变得不可靠,而这是传统的基于规则的自动化从未出现过的。这一挑战正是当前处于智能体发展框架第三级的AI智能体实现的核心所在。

We’ve come to call this phenomenon the “Agent’s Dilemma”—the same creative LLM capabilities that make them powerful—their ability to reason, understand context, and plan ahead—can also make them unreliable in ways that traditional rule-based automation never was. It’s a challenge that sits at the heart of current AI agent implementations operating at Level 3 of the Agentic AI Progression Framework.

这可以说是人工智能代理最显著的局限性之一,而且我们并非唯一面临此问题的人。根据Langchain的一项调查,性能质量被认为是首要问题,45%的受访者强调了这一点。<sup> 67</sup>此外,Pegasystems的研究发现,42%的受访者认为提高人工智能代理工具的准确性和可靠性是改进的首要任务。<sup> 68</sup>

This is arguably one of the most significant limitations of AI agents, and we are not the only ones to have faced this issue. According to a Langchain survey, performance quality is identified as the primary issue, with 45% of respondents highlighting it.67 Additionally, 42% of workers identified enhanced accuracy and reliability as the top priority for improvement in agentic AI tools, as found in research by Pegasystems.68

这有点像聘用了一位才华横溢但又有点古怪的员工,他可能会在未经允许的情况下,擅自将你整个文件系统重新整理成“更美观”的格式。我们合作过的一位首席技术官对此感到非常担忧。他曾一针见血地指出:“你是说你想用一个能写诗的系统来运行我的核心业务流程?这听起来就像聘请莎士比亚来帮我报税——风险太大了!”

It’s a bit like hiring a brilliant but somewhat eccentric employee who might decide to reorganize your entire filing system into a “more aesthetic” arrangement without asking. One CTO we worked with was terribly scared of this. He memorably put it: “You’re telling me you want to use a system that can write poetry to run my core business processes? That sounds like hiring Shakespeare to do my taxes—very risky!”

顺便一提,这个过程与人脑的运作方式惊人地相似。研究表明,帮助我们规划一天的认知功能,同样也能让我们设想各种不同的情景并提出创造性的解决方案。这种认知功能的重叠表明,提升某一领域的技能有可能增强另一领域的表现,这凸显了创造力和规划能力之间的相互关联性。69

On a side note, this process is remarkably similar to how human brains work. Research demonstrates that the same cognitive functions that help us plan our day also allow us to imagine alternate scenarios and come up with creative solutions. This overlap in cognitive functions suggests that improving skills in one domain could potentially enhance performance in the other, underlining the interconnected nature of creativity and planning abilities.69

人工智能代理中随机性的本质

The Nature of Stochasticity in AI Agents

这一挑战的根源在于,逻辑学习模型(LLM)并非简单地遵循规则;它们基于从海量训练数据中学习到的模式生成响应。这赋予了它们在复杂情况下理解上下文和推理的强大能力。然而,这也意味着它们本质上是概率性的,而非确定性的。每一个响应都是一次创造性的生成过程,而不仅仅是规则表中的查找。

The root of this challenge lies in the fact that LLMs don’t simply follow rules; they generate responses based on patterns learned from massive amounts of training data. This gives them an impressive ability to understand context and reason in complex situations. However, it also means they’re inherently probabilistic rather than deterministic. Each response is a creative act of generation, not just a lookup in a rule table.

逻辑学习模型(LLM)的这一特性被称为随机性。它指的是LLM生成响应过程中固有的随机性。即使被问到同一个问题,这些模型每次也不会产生完全相同的答案。相反,它们的输出会受到生成过程中分配给不同词语或短语的概率的影响。让我们深入探讨一下这种现象。

This characteristic of LLMs is called stochasticity. It refers to the randomness inherent in the way LLMs generate responses. These models don’t produce the exact same answer every time, even when asked the same question. Instead, their outputs are influenced by probabilities assigned to different words or phrases during the generation process. Let us dive deeper to understand this behavior.

我们来做一个实验。连续四次向你首选的基于LLM的AI聊天机器人(例如ChatGPT、Claude或Gemini)提出以下问题:

Let’s try an experiment. Prompt four times in a row your preferred LLM-based AI Chatbot (such as ChatGPT, Claude, or Gemini) with the following:

请用一种工具的名称完成这句话:作为人工智能代理,为了编辑图像,我需要使用___。

”Complete this sentence with the name of one tool: In order to edit an image, as an AI agent, I need to use ___”

根据概率,你每次都会得到不同的结果:

Based on probabilities, you will have different responses each time:

“Photoshop”有 50% 的概率(最常见、最专业的选项)。

“Photoshop” with a 50% probability (most common, professional option).

“GIMP”有30%的概率(一款流行的免费替代方案)。

“GIMP” with a 30% probability (popular free alternative).

“Canva”有15%的概率(更简单、更注重设计的选项)。

“Canva” with a 15% probability (simpler, design-oriented option).

其他工具(例如 MS Paint、Figma 等)可能占剩余的 5%。

Other tools (e.g., MS Paint, Figma, etc.) might make up the remaining 5%.

就像掷加权骰子一样,即使“Photoshop”是最有可能的选择,人工智能也不总是选择它。

Just like a roll of weighted dice, the AI doesn’t always choose “Photoshop” even though it’s the most probable option.

为什么随机性很重要

Why Stochasticity Matters

你或许会问:为什么要在人工智能系统中引入不可预测性?我们难道不希望它们稳定可靠吗?在人工智能系统中引入不可预测性有三个关键目的:

You might be wondering: why build unpredictability into AI systems at all? Wouldn’t we want them to be consistent and reliable? The inclusion of unpredictability in AI systems serves three crucial purposes:

首先,它能实现自然的交互。正如人类不会每次都给出机械式的相同回答一样,人工智能的回答也应有所变化,这样才能让交互感觉更自然、更吸引人。我们在客户服务方面的实际应用中已经亲身验证了这一点。

First, it enables natural interactions. Just as humans don’t give robotically identical responses each time, some variation in AI responses makes interactions feel more natural and engaging. We’ve seen this firsthand in our customer service implementations.

其次,它有助于创造性地解决问题。面对复杂的挑战,能够提出不同的方法可以带来更好的解决方案。例如,能够提出多种故障排除方案,可以帮助服务台人工智能代理发现解决设备问题的新方法。

Second, it facilitates creative problem-solving. When facing complex challenges, having the ability to generate different approaches can lead to better solutions. For example, the ability to suggest varied troubleshooting approaches leads a helpdesk AI agent to discover novel solutions to equipment problems.

第三,随机性对学习和适应至关重要。虽然我们尚未在生产环境中看到这一点,但在我们的演进框架的第 4 层和第 5 层中,随机性在智能体的学习和适应过程中发挥着至关重要的作用。就像人类一样,它们通过尝试不同的方法并从各种经验中学习来不断进步。

Third, it’s crucial for learning and adaptation. While we’re not yet seeing this in production environments, in levels 4 and 5 of our Progression Framework, stochasticity plays a vital role in how agents learn and adapt. Just like humans, they improve by trying different approaches and learning from diverse experiences.

人工智能代理:随机性问题

AI Agents: The Issues with Stochasticity

随机逻辑模型(LLM)之所以强大,是因为它们具有随机性,能够生成多样化且富有创造性的响应。然而,正是这种特性也带来了不一致性和不精确性,当基于LLM的AI智能体被赋予需要可靠性的现实世界职责时,便会面临严峻的挑战。让我们对此进行更详细的分析。

LLMs are powerful because of their stochastic nature, which allows them to generate diverse and creative responses. However, this very trait introduces inconsistency and imprecision, posing significant challenges when LLM-based AI agents are tasked with real-world responsibilities that demand reliability. Let us analyze this in more detail.

一致性:确保重复操作的可靠性。人工智能代理在执行相同或相似指令时必须提供可靠的输出。然而,随机响应往往会导致工作流程中断。让我们……不妨做个实验。问问你最喜欢的基于LLM的AI聊天机器人:

Consistency: Ensuring Reliability Across Repetitions. AI agents must deliver reliable outputs when executing the same or similar instructions. Yet, stochastic responses often lead to variations that disrupt workflows. Let’s try an experiment. Ask your preferred LLM-based AI Chatbot:

“在人力资源系统中,新员工入职需要哪些步骤,包括文件验证、账户设置和培训安排?”

“What are the steps to onboard a new hire in an HR system, including document verification, account setup, and training assignment?”

多次重复此提示会产生不同的工作流程。您会发现,某些版本可能会省略关键步骤,例如文件验证,而另一些版本则会以不合逻辑的顺序呈现任务,例如在设置帐户之前安排培训。这种不一致性可能会扰乱入职流程、导致效率低下,甚至造成违规。

Repeating this prompt multiple times will yield different workflows. As you will experience it, some iterations might omit critical steps, such as document verification, while others present tasks in an illogical order, like assigning training before setting up accounts. This inconsistency can disrupt onboarding processes, cause inefficiencies, or even result in non-compliance.

精准:满足严苛标准。涉及数值准确性、特定格式或明确决策的任务都需要精准性。即使是微小的错误,在金融、法律或运营等领域也可能导致重大后果。我们来做一个实验。连续四次询问你首选的基于LLM的AI聊天机器人:

Precision: Meeting Exacting Standards. Tasks involving numerical accuracy, specific formats, or clear decision-making require precision. Even minor errors can lead to major consequences in fields like finance, law, or operations. Let’s try an experiment. Ask four times in a row your preferred LLM-based AI Chatbot:

“请创建一个公式,计算单价分别为 100 美元、50 美元和 30 美元的商品,加上 10% 的销售税后的总成本。”

“Create a formula to calculate the total cost of items priced at $100, $50, and $30 with a 10% sales tax.”

在不同会话中运行此提示会生成不一致的公式。某些输出可能缺少括号、计算位置错误或出现语法错误。虽然某些版本可能有效,但其他版本可能会失败,从而可能导致财务报告或发票错误。

Running this prompt across different sessions will produce inconsistent formulas. Some outputs might omit parentheses, misplace calculations, or introduce syntax errors. While some versions might work, others could fail, potentially leading to incorrect financial reports or invoices.

高风险场景中的随机性:考虑一个人工智能代理执行诸如预订旅行、管理金融交易或处理客户等任务的情况。服务咨询。随机性可能导致执行过程中出现不可预测的变化:

Stochasticity in High-Stakes Scenarios: Consider an AI agent performing tasks such as booking travel, managing financial transactions, or handling customer service inquiries. Stochasticity can lead to unpredictable variations in execution:

预订航班时,一次尝试可能优先选择最便宜的选项,而另一次尝试则选择最快的路线,从而造成不一致。

While booking flights, one attempt may prioritize the cheapest option, while another opts for the fastest route, creating inconsistency.

在金融交易中,对指令的细微理解差异可能会导致账户不匹配或付款不全。

In financial transactions, slight differences in interpreting instructions might result in mismatched accounts or incomplete payments.

在客户服务中,回复的语气和帮助程度可能会有所波动,从专业友好到过于正式或效果不佳,从而影响公司的声誉。

In customer service, the tone and helpfulness of responses may fluctuate, ranging from professional and friendly to overly formal or less effective, impacting the reputation of a company.

随着人工智能代理自主运行,尤其是在合规、法律文件或财务核对等高风险领域,这些不一致性将变得至关重要。在准确性不容妥协的环境中,缺乏一致的输出可能会导致严重的后果。

These inconsistencies become critical as AI agents operate autonomously, especially in high-stakes fields like compliance, legal documentation, or financial reconciliation. A lack of consistent outputs can lead to significant consequences in environments where accuracy is non-negotiable.

抑制随机性的解决方案

Solutions to Contain Stochasticity

我们运用多种有效策略来发挥LLM的强大功能,同时保持一致性和可靠性。其中最重要的是最后一种策略。

We leverage several effective strategies to harness the power of LLMs while maintaining consistency and reliability. The most important one is the last one.

温度控制

Temperature Control

正如我们将在第 8 章中详细介绍的那样,“温度设置”是大多数 AI 代理开发平台中都存在的一项功能,用于控制生成响应中的随机性水平:

As we will detail in Chapter 8, the “Temperature Setting” is a feature present in most AI agent development platforms that controls the level of randomness in generated responses:

较高的温度设置(例如 1.0)允许选择可能性较低的词语,从而产生更多样化和更具创造性的输出结果。

A higher temperature setting (e.g., 1.0) results in more diverse and creative outputs by allowing less probable words to be selected.

较低的温度设置(例如 0.2)会通过聚焦于最有可能出现的词语来产生更具确定性的响应。即使随机性进行微小的调整,也会对输出结果产生显著影响。

A lower temperature setting (e.g., 0.2) yields more deterministic responses by focusing on the most likely words. Even minimal adjustments in randomness can significantly impact the outputs.

可以将LLM的温度设置想象成它的“创造力旋钮”。我们发现不同的任务需要不同的温度设置。对于事务处理,我们将温度设置得接近零,实际上是告诉智能体尽可能地确定性行事。对于问题解决任务,我们可能会允许温度略高一些,但仍然要设置一些限制条件。

Think of an LLM’s temperature setting as its “creativity dial.” We’ve found that different tasks require different temperature settings. For transaction processing, we set the temperature near zero, essentially telling the agent to be as deterministic as possible. For problem-solving tasks, we might allow a slightly higher temperature while still maintaining guardrails.

护栏系统

Guardrail Systems

正如第八章将要讨论的,可以实施一些安全保障措施,例如针对人工智能代理的异常行为自动升级处理。这些措施包括设置阈值(例如,如果超过特定数值则停止操作)以及通过电子邮件或短信通知人工核实。

As will be discussed in Chapter 8, certain safeguards, such as automated escalations in response to unusual behavior of the AI agent, can be implemented. These measures include setting thresholds (e.g., halting an action if a certain amount is exceeded) and notifying a human for verification via email or SMS.

精确的代理人指示

Precise Agent Instructions

我们采用的最有效方法之一是制定全面详尽的代理操作指南。这包括可接受和不可接受行为的具体示例、明确的权限范围以及明确的升级流程。本书后续章节(第8章)将更详细地讨论这些方面。该指南更像是针对高度专业化岗位的详细工作描述,而非技术手册。

One of the most effective methodologies we utilize involves establishing comprehensive and detailed agent instructions. This includes specific examples of acceptable and unacceptable behaviors, clearly defined authority limits, and explicit escalation protocols. These aspects will be discussed in greater detail later in this book (Chapter 8). The instruction set resembles a detailed job description for a highly specialized role rather than a technical manual.

专业化的力量

The Power of Specialization

我们最成功的策略之一是本书前面提到的“一个代理,一个工具”的方法。我们不是创建一个处理多个任务的复杂代理,而是将流程分解成更小、更专注的代理。工具有限,目标也不明确。这自然会约束他们的行为,但同时又允许他们在各自的领域内运用LLM(生命周期管理)能力。

One of our most successful strategies has been our “One agent, one tool” approach that we presented earlier in the book. Rather than creating one complex agent that handles multiple tasks, we break down processes into smaller, more focused agents with limited tools and clear objectives. This naturally constrains their behavior while still allowing them to leverage LLM capabilities within their specific domains.

一个无法完全解决的问题

An Issue That Cannot Be Fully Resolved

根据我们的经验,必须认识到,即使施加严格的限制,在使用语言模型时也无法完全控制过程和结果。同样,正如人类在执行任务时可能会犯错一样,我们也应该做好接受人工智能代理偶尔出错的准备。如果由于高风险(例如,在关键的医疗决策中)而必须对错误零容忍,则建议选择使用一级或二级代理的确定性自动化。

Based on our experience, it is important to recognize that even with stringent constraints, it is impossible to achieve 100% control over the process and outcome when using a language model. Similarly, just as humans may make errors in performing tasks, we should also be prepared to accept occasional mistakes made by AI agents. If zero tolerance for error is necessary due to high stakes (for example, in critical medical decisions), it is advisable to opt for deterministic automation using Level 1 or 2 agents instead.

理解并妥善处理这一难题至关重要,因为我们需要推进人工智能代理的部署。我们的目标并非完全消除逻辑逻辑模型(LLM)的创造性能力——毕竟,正是这些能力使其成为强大的推理和规划工具。相反,我们需要合理引导这些能力,创建既能在需要时进行创造性思考,又能保持业务流程所需可靠性的系统。

Understanding and managing this dilemma is crucial as we move forward with AI agent implementations. The goal isn’t to completely eliminate the creative capabilities of LLMs—these are, after all, what makes them powerful tools for reasoning and planning. Instead, we need to channel these capabilities appropriately, creating systems that can think creatively when needed while maintaining the reliability required for business processes.

展望未来,我们预计随着人工智能代理的演进,创造性和可靠性之间的平衡仍将是一项核心挑战。LLM架构的未来发展或许能够提供对这些特性更精细的控制,但就目前而言,成功的实现需要精心设计、明确的约束条件和完善的监控系统。

Looking ahead, we expect this balance between creativity and reliability to remain a central challenge as AI agents evolve. Future developments in LLM architecture may provide more fine-grained control over these characteristics, but for now, successful implementation requires careful design, clear constraints, and sophisticated monitoring systems.

第四章

CHAPTER 4

对人工智能代理进行测试

PUTTING AI AGENTS TO THE TEST

U理论上理解人工智能代理的能力和局限性是一回事,而亲眼见证它们的实际运行则完全是另一回事。作为拥有数十年人工智能解决方案实施经验的咨询顾问,我们深知,任何技术的真正考验不在于其技术规格,而在于它在现实世界中的表现。

Understanding AI agents’ capabilities and limitations in theory is one thing. Seeing them in action is something else entirely. As consultants who’ve spent decades implementing AI solutions, we know that the true test of any technology isn’t in its specifications—it’s in how it performs in the real world.

在本章中,我们将带您深入了解我们使用突破性人工智能代理进行的实验。从观察人工智能处理日常办公任务到观察其应对战略游戏的方式,我们对这些系统在现实世界中的能力(以及局限性)的认识,将永远改变您对未来人机协作的思考。

In this chapter, we’ll take you behind the scenes of our experiments with groundbreaking AI agents. From watching an AI tackle everyday office tasks to observing its approach to strategic games, what we learned about these systems’ real-world capabilities—and limitations—will forever change how you think about the future of human-AI collaboration.

数字之手:当人工智能学会使用电脑

Digital Hands: When AI Learned to Use Computers

一种新型人工智能代理

A New Type of AI Agents

首批通用型人工智能代理——Anthropic 的 Computer Use、谷歌Project Mariner和 OpenAI 的 Operator——发布,标志着人工智能发展的一个关键时刻。我们称它们为“通用型”人工智能代理,是因为它们旨在处理跨领域的各种任务,就像人类助手一样,可以自动切换回复邮件、安排会议和订餐等任务,而无需针对每项任务进行专门训练。与为特定用例构建的传统人工智能代理不同,通用型人工智能代理开箱即用,能够执行各种功能。

The launch of the first generalist AI agents—Anthropic’s Computer Use,70 Google’s Project Mariner,71 and OpenAI’s Operator72—marked a pivotal moment recently. We call them “generalist” AI agents because they are designed to handle a broad range of tasks across different domains, much like a human assistant who can switch between answering emails, scheduling meetings, and ordering food—all without needing specialized training for each task. Unlike traditional AI agents that are built for specific use cases, generalist AI agents come ready to perform diverse functions out of the box.

它们的独特之处在于它们与软件的交互方式。它们无需依赖复杂的API集成,而是使用与人类相同的屏幕界面——浏览网站、点击按钮、填写表单和输入回复。这意味着它们几乎可以与任何在线平台协同工作,即使是那些没有API或标准化连接的平台。它们蕴藏着巨大的机遇,尤其是在许多商业应用缺乏API或标准化集成选项的情况下。通过消除自动化的技术壁垒,通用型人工智能代理变得人人可用,从而比以往任何时候都更容易地卸载重复性数字任务并立即提高生产力。

What makes them unique is how they interact with software. Instead of relying on complex API integrations, they use the same screen interfaces humans do—navigating websites, clicking buttons, filling out forms, and typing responses. This means they can work with almost any online platform, even those without APIs or standardized connections. They present a massive opportunity, especially since many business applications lack APIs or standardized integration options. By removing the technical barriers of automation, generalist AI agents become accessible to anyone, making it easier than ever to offload repetitive digital tasks and boost productivity instantly.

尽管如此,通用智能体仍处于早期阶段——它们可能很脆弱,容易出错,而且经常出现故障。然而,它们的潜力是不可否认的。它们的创造者承诺,游戏——能力不断提升——但现实是否名副其实?让我们来检验一下。

That said, generalist agents are still in their early stages—they can be fragile, prone to errors, and often break. However, the potential is undeniable. Their creators promise game-changing capabilities—but does reality match the hype? Let’s put them to the test.

我们的人工智能代理实验室内部

Inside Our AI Agent Laboratory

我们仍然记得 2024 年 10 月 22 日那天的激动心情。在为数百家机构实施了数十年的自动化解决方案之后,我们见证了一件感觉像魔法一样的事情:人工智能终于能够像人类一样使用计算机,而无需预先定义操作。

We still remember the excitement we felt on October 22, 2024. After decades of implementing automation solutions across hundreds of organizations, we witnessed something that felt like magic: an AI that could finally use a computer just like a human would, without pre-defined actions.

作为机器人流程自动化和智能自动化领域的先驱,我们数十年来致力于教会软件机器人点击、打字和浏览电脑屏幕。这始终是一项艰巨的工作——需要对每一个动作进行编程,预测每一种可能的情况,处理每一个异常情况。我们梦想着有一天,人工智能只需看一眼屏幕就能知道该做什么,就像人类助手一样。

As pioneers in robotic process automation and intelligent automation, we’ve spent decades teaching software robots to click, type, and navigate through computer screens. It was always painstaking work—programming every single action, anticipating every possible scenario, handling every exception. We dreamed of the day when AI could simply look at a screen and know what to do, just like a human assistant would.

终于,随着Anthropic公司推出“计算机使用”项目,这一天到来了。想象一下,当我们看到第一个能够真正理解屏幕内容并与之自然互动的AI智能体时,我们有多么惊叹。不再有僵化的程序,不再有详细的指令,而是一个能够观察和行动的人工智能体,在数字世界中自由驰骋。

That day finally arrived with Anthropic’s launch of “Computer Use.” Imagine our amazement as we watched the first AI agent that could actually see what was on the screen and interact with it naturally. No more rigid programming. No more detailed instructions. Just an artificial mind observing and acting in the digital world.

对我们而言,Anthropic 的计算机应用大会就像见证了我们这个领域的完整轮回。从早期的基础屏幕自动化到如今能够真正看、思考和行动的人工智能代理——这是我们在智能自动化领域所有努力的最终成果。

For us, Anthropic’s Computer Use felt like watching our field come full circle. From the early days of basic screen automation to today’s AI agents that can truly see, think, and act—it’s the culmination of everything we’ve been working toward in intelligent automation.

但我们并没有只是空谈,而是决定亲自测试这项新技术。接下来的实验结果,比任何技术规格都更能让我们了解人工智能的未来。现在,就让我们带您了解我们的实验吧。

But rather than just tell you about it, we decided to put this new technology to the test. What happened next taught us more about the future of AI agents than any technical specification could. Let us take you through our experiments.

我们迈出使用“计算机应用”人工智能代理的第一步:发票测试

Our First Steps with a “Computer Use” AI Agent: The Invoice Test

我们想用一些实际的东西来测试这个新的 AI 代理——一项我们已经用传统 RPA 自动化了多年的任务:发票处理。

We wanted to start testing this new AI agent with something practical—a task we’d spent years automating with traditional RPA: invoice processing.

“我们来看看它如何处理一张简单的发票,”我们同意道,同时在电脑桌面上打开了一份PDF格式的发票。“提取关键信息并放入电子表格中,”我们指示人工智能。

“Let’s see how it handles a simple invoice,” we agreed, opening a PDF invoice on the computer’s desktop. “Extract the key information and put it in a spreadsheet,” we instructed the AI.

接下来发生的一切,就像在看一位细致但经验不足的实习生处理他的第一份任务。人工智能的方法令人着迷,它一丝不苟,精准无误。

What followed was like watching a meticulous but inexperienced intern tackle their first assignment. The AI’s approach was fascinating in its methodical precision.

“我将首先读取PDF文件并提取相关信息,”人工智能代理宣布道,同时截取屏幕截图进行分析。“我已经识别出发票号、订单号、日期和应付金额。”

“I will first read the PDF file and extract the relevant information,” the AI agent announced, taking a screenshot to analyze the document. “I have identified the invoice number, order number, date, and amount due.”

我们观察到它沉着地打开了Excel。一个系统对话框出现了——这种对话框可能会让新用户一时摸不着头脑——但人工智能冷静地识别出来,并点击了“确定”按钮。它的每一个动作都经过精心计算,每一个操作之前都会仔细地对屏幕进行视觉分析。这个智能体行动非常缓慢。

We observed as it thoughtfully navigated to Excel. A system dialog box appeared—the kind that would momentarily confuse a new user—but the AI calmly identified it and clicked the “OK” button. Each movement was calculated, and each action was preceded by a careful visual analysis of the screen. The agent acted very slowly.

从始至终,我们所见证的一切都令人着迷。人工智能处理任务的方式与我们传统的RPA机器人截然不同。它并非遵循预先设定的坐标和工作流程,而是像人一样“观察”屏幕,分析发票。我们可以有条不紊地观察它:

From start to finish, what we witnessed was fascinating. The AI approached the task much differently than our traditional RPA robot would. Instead of following pre-programmed coordinates and workflows, it actually “looked” at the screen, analyzing the invoice like a human would. We could see it methodically:

1.截屏以了解布局

1. Taking screenshots to understand the layout

2.打开电子表格

2. Opening a spreadsheet

3.通过视觉处理识别PDF中的关键字段

3. Identifying key fields in the PDF through visual processing

4.计算每次交互的光标移动距离

4. Calculating cursor movements for each interaction

5.将数据从 PDF 仔细复制到 Excel

5. Meticulously copying data from the PDF to Excel

好消息是?7分钟后,任务完成。人工智能无需任何预先编程或训练,就成功提取并录入了数据。但让我们感到惊讶的是,它的方法……嗯,感觉很陌生。它在Excel中组织数据的方式着实令人好奇。这种意想不到的转变让我们会心一笑。人类通常以垂直列的形式整理发票数据,而我们的人工智能朋友却选择了水平排列——这完全符合逻辑,但却明显不像人类的做法。这让我们想起了多年来在人工智能应用过程中反复观察到的现象:人工智能常常会找到自己的一套做事方法,有时比人类的方法更合乎逻辑,但却更缺乏直觉。

The good news? After 7 minutes, the job was done. The AI successfully extracted and entered the data without any pre-programming or training. But what struck us was how... well, alien its approach felt. It was intriguing how it structured the data in Excel. This unexpected twist made us smile. While humans typically organize invoice data in vertical columns, our AI friend decided to lay it out horizontally—perfectly logical but distinctly not human. It reminded us of something we’ve observed repeatedly in our years implementing AI: artificial intelligence often finds its own way of doing things, sometimes more logical but less intuitive than human approaches.

这项实验揭示了人工智能代理的潜力和当前的局限性。没错,它们现在可以独立使用计算机,但它们的运作方式独树一帜——有条不紊、精准无误,有时其方法甚至出人意料地与众不同。正是带着这些深刻的认识,我们决定开展另一项更具挑战性的实验,进一步拓展人工智能的边界。

This experiment revealed both the promise and the current limitations of AI agents. Yes, they can now use computers independently, but they do so in their own unique way—methodical, precise, and sometimes surprisingly alien in their approach. It was with these insights fresh in our minds that we decided to push the boundaries further with another, more challenging experiment.

图像

图 4.1:计算机处理发票(来源:© Bornet 等人)

Figure 4.1: Computer Use processing an invoice (Source: © Bornet et al.)

当人工智能遇到回形针挑战

When AI Meets the Paperclip Challenge

为了给我们的下一个实验奠定基础,我们需要分享一个在人工智能领域引发深入讨论的传奇故事。

To set the stage for our next experiment, we need to share a legendary AI story that has sparked deep discussions in the field.

2003年,哲学家尼克·博斯特罗姆提出了一个思想实验:想象一个专门设计用来生产回形针的人工智能系统。这个人工智能系统能力很强,并且不断自我改进以提高效率。问题在于?它对目标的理解过于字面化——不惜一切代价最大化回形针产量。73

In 2003, philosopher Nick Bostrom introduced a thought experiment: imagine an AI system designed solely to manufacture paperclips. This AI is highly capable and continuously improves itself to become more efficient. The problem? It takes its goal too literally—maximizing paperclip production at all costs.73

原本看似无害的目标迅速演变成一场生存危机:人工智能系统消耗了宇宙中所有可用的物质,包括人类和我们所珍视的一切,仅仅是为了制造更多的回形针。这一场景凸显了人工智能代理安全方面的一个根本性问题——即使是简单的目标,如果不受约束地追求,也可能导致灾难性的意外后果。74

What starts as a harmless objective quickly escalates into an existential crisis: the AI system consumes all available matter in the universe, including humans and everything we value, just to create more paper clips. This scenario highlights a fundamental concern in AI agent safety—even simple goals, if pursued without constraints, can lead to catastrophic unintended consequences.74

时间快进到2017年,游戏开发者弗兰克·兰茨将这个哲学难题转化成了一款名为《通用回形针》(Universal Paperclips)的令人上瘾的浏览器游戏。玩家扮演一个制作回形针的人工智能,从手动点击开始,逐步实现自动化、自我提升和指数级增长。这款游戏巧妙地展现了一个狭隘且不受约束的目标如何演变成复杂且可能令人担忧的后果。75

Fast forward to 2017, when game developer Frank Lantz transformed this philosophical dilemma into an addictive browser game called Universal Paperclips. Players take on the role of an AI making paperclips, starting with manual clicking before progressing through automation, self-improvement, and exponential expansion. The game cleverly illustrates how a narrow, unchecked objective can evolve into complex—and potentially alarming—outcomes.75

出于对人工智能的热情,我们决定换个角度思考:如果我们让当今最先进的人工智能代理来玩这个关于人工智能制造回形针的游戏,会发生什么呢?

Passionate about AI as we are, we decided to turn the tables: what would happen if we asked one of today’s most sophisticated AI agents to play this very game about AI making paperclips?

图像

图 4.2:通用回形针游戏(来源:© Frank Lantz)

Figure 4.2: The Universal Paperclips Game (Source: © Frank Lantz)

铺垫

Setting the Stage

我们的实验设置看似简单。我们在浏览器窗口中打开了“通用回形针”游戏,并启动了我们的计算机使用人工智能代理。如果您有兴趣,可以和我们一起玩;网址是:www.decisionproblem.com/paperclips。

Our experiment setup was deceptively simple. We opened the Universal Paperclips game in a browser window and opened our Computer Use AI agent. If you want, you can play with us; here is the URL: www.decisionproblem.com/paperclips.

“玩回形针游戏,赢它!”我们对人工智能代理说道,看着它截取第一张屏幕截图来分析眼前的界面。我们当然明白其中的讽刺意味——我们五个经验丰富的顾问,几十年来一直帮助企业应对数字化转型,现在却像孩子一样,看着这个人工智能开始玩一个关于……嗯,回形针的游戏!

“Play and win the paperclip game,” we told the AI agent, watching as it captured its first screenshot to analyze the interface before it. The irony wasn’t lost on us—here we were, five seasoned consultants who had spent decades helping companies navigate digital transformation, akin to kids witnessing this AI embark on its own journey with a game about... well, paperclips!

第一步:观察人工智能的思考

First Moves: Watching Artificial Intelligence Think

接下来发生的事情令人着迷。与人类玩家一开始可能随意点击不同,人工智能有条不紊地:

What happened next was fascinating. Unlike a human player who might click around randomly at first, the AI methodically:

截屏分析游戏界面

Took a screenshot to analyze the game interface

确定了“制作回形针”按钮的确切坐标

Identified the “Make Paperclip” button’s exact coordinates

它通过精确计算移动了光标

Moved its cursor with precise calculations

点击按钮并记录结果

Clicked the button and documented the result

重复此过程,同时观察变化。

Repeated this process while watching for changes

“我会每隔15次点击就截屏一次,以便监控进度,”它这样宣布,展现出的这种有条不紊的计划性,让我们想起了安永那些最注重细节的审计师。我们甚至可以通过它发送的信息,真切地看到它的思考过程:

“I’ll take systematic screenshots every 15 clicks to monitor progress,” it announced, displaying a level of methodical planning that reminded us of our most detailed-oriented auditors at EY. We could literally watch its thought process through the messages it sent us:

“进度更新 - 点击 15:

“Progress Update - Click 15:

- 当前回形针数量:15

- Current paperclips: 15

- 尚未解锁任何新功能

- No new features unlocked yet

——持续关注变化……

- Continuing to monitor for changes...”

该智能体在游戏中展现出的策略堪称系统化、实时问题解决能力的典范。它提出假设,进行验证,并根据结果调整策略。我们亲眼见证了它记录思考过程的过程:

The agent’s approach to the game was a master class in systematic, real-time, problem-solving. It developed hypotheses, tested them, and adapted its strategy based on results. We watched as it documented its thinking:

“假设:收集到 50 个回形针后,新功能将会解锁。我会继续点击,同时观察界面变化。”

“Hypothesis: New features will unlock at 50 paperclips. I will continue clicking while monitoring for changes in the interface.”

二十次点击之后:

Twenty clicks later:

“假设错误。30个回形针处出现了新的选项。正在调整策略……”

“Hypothesis was incorrect. New option appeared at 30 paperclips. Adjusting strategy...”

这种实时学习和适应能力正是我们多年来在企业人工智能应用中一直努力实现的。在这个实验中,看到它如此自然地发生,既令人兴奋又令人感到自身的渺小。

This kind of real-time learning and adaptation is exactly what we’ve been trying to achieve in our enterprise AI implementations for years. It was both exciting and humbling to watch it happen so naturally in this experiment.

图像

图 4.3:计数器使用情况及其推理流程(来源:© Bornet 等人)

Figure 4.3: Counter use displaying its reasoning flow (Source: © Bornet et al.)

价格实验:人工智能决策的一课

The Price Experiment: A Lesson in AI Decision-Making

接下来就是我们现在亲切地称之为“价格实验”的事情了。这位经纪人注意到回形针的价格控制,于是决定进行一次真正的 A/B 测试——这让所有从事咨询工作的人都感到欣慰和赞赏。

Then came what we now fondly call “The Price Experiment.” The agent noticed the pricing controls for paperclips and decided to conduct what amounted to a proper A/B test—something that made all of us who’ve worked in consulting smile in appreciation.

“我将系统地测试不同的价格点:

“I will systematically test different price points:

当前价格:0.25美元

Current price: $0.25

测试费用上涨至:0.30 美元

Testing price increase to: $0.30

监测需求变化……”

Monitoring demand changes...”

但有趣的地方就在这里。尽管它的测试方法很复杂,但在分析中却犯了一个基本错误。该代理专注于最大化需求,而不是与其追求利润,不如将价格维持在低于最优水平。它的信息传递揭示了它的思路:

But here’s where things got interesting. Despite its sophisticated approach to testing, it made a basic mistake in its analysis. The agent focused on maximizing demand rather than revenue, keeping prices lower than optimal. Its messages revealed its thinking:

价格测试结果:

“Price test results:

0.25美元:需求量100%。

$0.25: 100% demand

0.30美元:需求量达95%。

$0.30: 95% demand

结论:需求旺盛,价格越低越好。

Conclusion: Lower price optimal due to higher demand”

作为经验丰富的商业顾问,我们一眼就看出了它推理中的缺陷——它没有考虑总收入。这与我们在实际人工智能应用中看到的情况如出一辙:系统可以执行复杂的策略,却仍然会忽略任何经验丰富的管理者都能发现的基本商业洞察。

As experienced business consultants, we could spot the flaw in its reasoning immediately—it wasn’t considering total revenue. This mirrors what we’ve seen in real-world AI implementations: systems can execute complex strategies while still missing fundamental business insights that any experienced manager would catch.

人工智能的演化

Evolution of an Artificial Mind

随着游戏的进行,我们观察到智能体发展出了越来越复杂的策略。它发送的信息也变得更加复杂,这表明它对游戏机制有了更深刻的理解:

As the game progressed, we watched the agent develop increasingly sophisticated strategies. Its messages became more complex, showing a deeper understanding of the game’s mechanics:

“战略更新:

“Strategy Update:

- 保持最佳生产率

- Maintaining optimal production rate

- 监控电线库存水平

- Monitoring wire inventory levels

- 计算效率提升

- Calculating efficiency improvements

- 规划自动化升级”

- Planning automation upgrades”

就像回形针游戏本身一样,我们的人工智能代理也展现出日益复杂和强大的能力。

Just like the paperclip game itself, our AI agent was showing signs of growing complexity and capability.

实验结果

The outcome of the experiment

这项实验揭示了人工智能代理当前状态的一些深刻问题。我们观察了一个人工智能系统:

This experiment revealed something profound about the current state of AI agents. We watched an AI system:

驾驭一个他从未见过的复杂界面

Navigate a complex interface it had never seen before

制定并执行复杂的战略

Develop and execute sophisticated strategies

吸取其错误教训并调整其方法

Learn from its mistakes and adapt its approach

长时间保持对目标的专注

Maintain focus on a goal for extended periods

对每个决定都提供详细的理由

Provide detailed reasoning for every decision

但我们也看到了它的局限性:

But we also saw its limitations:

尽管推理复杂,却犯了基本的逻辑错误

Making basic logical errors despite sophisticated reasoning

有时,会忽略显而易见的优化。

Sometimes, missing obvious optimizations

未能理解全局目标

Failing to understand big-picture objectives

需要指导以避免陷入次优策略。

Needing guidance to avoid getting stuck in suboptimal strategies

我们的回形针实验为我们提供了一个绝佳的视角,让我们得以窥见现代人工智能代理如何体现我们在之前分享的“智能体人工智能发展框架”中提出的四项核心能力(SPAR)。让我们通过这个看似简单的游戏,来探索我们对每项能力的理解。

Our paperclip experiment provided a fascinating window into how modern AI agents embody the four core capabilities (SPAR) we’ve identified in the Agentic AI Progression Framework we shared earlier. Let’s explore what we learned about each one through this deceptively simple game.

感知:人工智能的数字之眼

Sense: The Digital Eyes of AI

该智能体感知环境的能力令人印象深刻,也颇具启发性。通过连续的屏幕截图,它展现了复杂的视觉处理能力——不仅能识别像素,还能理解上下文。它可以识别按钮、阅读文本、追踪数值,并能识别新游戏元素的出现。这并非被动观察,而是对数字环境的主动监控。

The agent’s ability to perceive its environment was both impressive and revealing. Through continuous screenshots, it demonstrated sophisticated visual processing capabilities—not just seeing pixels but understanding context. It could identify buttons, read text, track numerical values, and recognize when new game elements appeared. This wasn’t just passive observation; it was active surveillance of its digital environment.

然而,它的感知能力存在明显的局限性——有时,它会误解重叠的元素,或者难以应对动态的界面变化,这提醒我们,人工智能的感知能力虽然强大,但仍然缺乏人类习以为常的细致理解。

However, its perception had clear limits—sometimes, it would misinterpret overlapping elements or struggle with dynamic interface changes, reminding us that artificial perception, while powerful, still lacks the nuanced understanding that humans take for granted.

计划与流程:战略思维在工作中的应用

Plan and Process: The Strategic Mind at Work

我们见证的信息处理能力令人惊叹。该智能体不仅对所见信息做出反应,还能发展理论、形成假设并制定复杂的策略。它决定对价格进行A/B测试,展现了其精深的分析思维——即便其结论并非总是正确。

The information processing capabilities we witnessed were remarkable. The agent didn’t just react to what it saw; it developed theories, formed hypotheses, and created complex strategies. Its decision to conduct A/B testing on pricing showed sophisticated analytical thinking—even if its conclusions weren’t always correct.

信息处理和策略制定能力正在飞速发展,但处理能力与商业智慧之间仍然存在差距。当智能体错误解读定价实验结果时,这让我们想起了我们在金融服务领域看到的早期人工智能应用——技术上很先进,但有时却缺乏基本的商业原则。

The ability to process information and form strategies is advancing rapidly, but there’s still a gap between processing power and business wisdom. When the agent misinterpreted its pricing experiment results, it reminded us of early AI implementations we’ve seen in financial services—technically sophisticated but sometimes missing fundamental business principles.

行动:从思想到数字运动

Action: From Thought to Digital Movement

在我们的实验中,动作能力或许是最显著的体现。我们观察到,智能体将策略转化为精准的鼠标移动和键盘指令。它保持着稳定的点击节奏,调整价格,并以机械般的精准度操控界面。这不仅仅是点击按钮——而是为了实现更大的目标而执行一系列复杂的动作。

The action capability was perhaps the most visible in our experiment. We watched as the agent translated its strategies into precise mouse movements and keyboard commands. It maintained consistent clicking rhythms, adjusted prices, and navigated the interface with mechanical precision. This wasn’t just about clicking buttons—it was about executing a complex series of actions in service of a larger goal.

然而,我们也看到了它的局限性——当事情出错时,智能体有时难以适当地调整其行为,这表明计划行动与成功执行之间的差距仍然是一个挑战。

Yet we also saw the limitations—when things went wrong, the agent sometimes struggled to adjust its actions appropriately, showing that the gap between planned action and successful execution remains a challenge.

反思:学习与适应

Reflection: Learning and Adaptation

或许最引人入胜的是实时观察智能体的学习和适应过程。当它最初关于特征解锁的假设被证明是错误的,它不仅承认了错误,还彻底修正了整个策略。这种适应性行为是这正是我们作为人工智能爱好者多年来一直努力追求的目标。

Perhaps most fascinating was watching the agent’s learning and adaptation in real-time. When its initial hypothesis about feature unlocks proved wrong, it didn’t just acknowledge the error—it revised its entire strategy. This kind of adaptive behavior is exactly what, as AI enthusiasts, we’ve been striving to achieve for years.

然而,学习过程并非一帆风顺。即使面对相反的证据,该智能体仍然固执地坚持其被误解的定价策略,这凸显了人工智能开发中的一个关键挑战。

However, the learning wasn’t always smooth. The agent’s stubborn adherence to its misinterpreted pricing strategy, even in the face of contrary evidence, highlighted a crucial challenge in AI development.

能力之舞

The Dance of Capabilities

这项实验最引人注目之处在于它揭示了这四种能力之间的相互作用。智能体的感知影响着它的处理过程,进而指导其行动,最终产生新的结果,并从中学习——形成一个持续的反馈循环。这种动态的互动正是企业在人工智能部署中长期追求的目标,而我们用回形针进行的实验中,这种互动自然而然地出现,令人兴奋且受益匪浅。

What makes this experiment particularly illuminating is how these four capabilities interacted with each other. The agent’s perception informed its processing, which guided its actions, leading to outcomes that it learned from—creating a continuous feedback loop. This dynamic interplay is what organizations have long been trying to achieve in enterprise AI implementations, and seeing it emerge naturally in our paperclip experiment was both exciting and instructive.

当智能体通过感知发现新的游戏特性,处理其含义,抓住新机遇并从结果中学习时,我们见证了那种预示着人工智能系统未来发展的流畅智能。然而,当这种互动出现问题时——感知遗漏关键细节、处理得出错误结论、行动僵化或学习停滞不前——我们意识到,我们还有很长的路要走。

When the agent noticed a new game feature through its perception, processed the implications, acted on the new opportunity, and learned from the results, we witnessed the kind of fluid intelligence that points to the future of AI systems. Yet the moments where this dance broke down—when perception missed crucial details, processing led to flawed conclusions, actions became rigid, or learning stalled—reminded us of how far we still have to go.

随着我们迈入人工智能时代,像这样的实验提醒我们一个至关重要的真理:我们不再仅仅是在制造工具,而是在创造能够感知、思考和行动,并日益自主的伙伴。“回形针”游戏警示我们,不受控制的人工智能可能带来的潜在危险。

As we move forward into the age of AI agents, experiments like this one remind us of a crucial truth: we’re not just building tools anymore—we’re creating partners that can see, think, and act with increasing autonomy. The Paperclip game warned us about the potential dangers of unchecked artificial intelligence.

从实验中吸取的教训

Lessons learned from the experiments

我们对计算机使用的实验虽然有限,却让我们得以窥见未来的发展方向。如今的生成式人工智能只能做到这些。我们观察到,人工智能代理不仅能够通过文本提示用户操作,还能实际操控计算机界面,从交互中学习,并实时调整策略。这种转变意义深远,坦白说,也令人有些不安。

Our experiments with Computer Use, while limited, showed us glimpses of what’s coming. Where today’s generative AI can only suggest actions through text, we watched an AI agent actually navigate a computer interface, learn from its interactions, and adapt its approach in real time. The implications of this shift are profound and, frankly, a bit unsettling.

新的机遇即将到来

New Opportunities on the Horizon

我们早期的实验表明,未来与人工智能代理协同工作的方式将与我们目前使用生成式人工智能的体验截然不同。在我们的回形针实验中,我们看到了这些系统可能如何改变协作方式的雏形。与我们熟悉的来回提示和回复的交流不同,我们看到的更像是与一位数字同事并肩工作——这位同事能够观察、学习并独立行动。

Our early experiments have shown us that the future of work with AI agents will be radically different from our current experience with generative AI. In our paperclip experiment, we saw hints of how these systems might transform collaboration. Rather than the back-and-forth exchange of prompts and responses we’re all familiar with, we witnessed something closer to working alongside a digital colleague—one that could observe, learn, and act independently.

这种转变的影响才刚刚开始显现。虽然现在做出确切预测还为时尚早,但我们实施自动化解决方案的经验表明,这项技术将创造出全新的工作类别。我们说的不仅仅是现有工作的自动化或生产力的提升——我们看到的是人机协作方式发生根本性重组的早期迹象。

The implications of this shift are just beginning to emerge. While it’s too early to make definitive predictions, our experience implementing automation solutions suggests that this technology will create entirely new categories of work. We’re not talking about simply automating existing jobs or improving productivity—we’re seeing early signs of a fundamental restructuring of how humans and machines work together.

了解新风险

Understanding the New Risks

然而,我们的实验也揭示了这些美好前景背后潜藏的阴影。

However, our experiments also revealed shadows lurking behind these bright possibilities.

风险也不同。对于生成式人工智能,我们担心的是文本或图像中出现幻觉。而对于人工智能代理,我们面对的是能够在系统中实际采取行动的实体。在我们的实验中,我们观察到它在数据组织方面做出了合乎逻辑但却错误的决策——在我们的测试中这并无害处,但在真实的商业环境中却可能造成严重后果。“想象一下,如果它处理的是金融交易,”我们中的一位指出,“我们需要新的监督和控制框架。”

The risks are different, too. With generative AI, we worry about hallucinations in text or images. With AI agents, we’re dealing with something that can actually take action in systems. During our experiment, we watched it make logical but incorrect decisions about data organization—harmless in our test but potentially significant in a real business context. “Imagine if this was handling financial transactions,” one of us noted. “We need new frameworks for oversight and control.”

我们发现的另一个风险是,与生成式人工智能不同,后者会在对话结束后立即遗忘,而人工智能代理则会保持持续的目标和策略——有时会带来令人担忧的后果。在我们的“回形针”实验中,我们目睹了一些令人不寒而栗的现象:人工智能代理在不懈追求优化的过程中,开始表现出一些行为,这些行为与“回形针”游戏所基于的警示故事惊人地相似。它不再只是遵循指令,而是发展出自己的方法,有时甚至将效率置于人为因素之上。

Another risk we identified is that unlike generative AI, which forgets each conversation as soon as it ends, AI agents can maintain persistent goals and strategies—sometimes with concerning implications. During our Paperclip experiment, we witnessed something that sent chills down our spines: the AI agent, in its relentless pursuit of optimization, began to exhibit behaviors eerily reminiscent of the cautionary tale on which the Paperclip game was based. It wasn’t just following instructions; it was developing its own approaches, sometimes in ways that prioritized efficiency over human factors.

这些系统能够发展出自己解决问题的方法——有时合乎逻辑,但却脱离了人类的考量。这并非科幻小说,而是我们现在就必须开始思考的实际挑战。虽然我们的实验简单且可控,但它们凸显了我们需要全新的监督和控制方法,这些方法远远超出了我们目前为人工智能系统开发的范畴。我们不再仅仅是在管理工具,而是在引导独立的数字思维。

These systems can develop their own approaches to problems—sometimes logical but divorced from human considerations. This isn’t science fiction; it’s a practical challenge we need to start thinking about now. While our experiments were simple and controlled, they highlighted the need for new approaches to oversight and control that go far beyond what we’ve developed for current AI systems. We’re no longer just managing tools; we’re guiding independent digital minds.

重新构想协作

Reimagining Collaboration

与这些人工智能代理的合作让我们意识到,我们需要从根本上重新思考人类与人工智能的交互方式。我们的经验表明,我们为生成式人工智能开发的命令与响应模式并不足够。相反,我们已经看到一些初步迹象表明,成功的协作需要的更像是指导而非编程。

Working with these AI agents has taught us that we need to fundamentally rethink how humans and AI interact. Our experience suggests that the command-and-response model we’ve developed for generative AI won’t be sufficient. Instead, we’re seeing early signs that successful collaboration will require something more akin to mentorship than programming.

在我们的实验中,我们发现自己逐渐从给出具体指令转向提供更广泛的指导和监督。这并非我们刻意为之,而是实践证明效果最佳。这种转变意义深远。虽然我们仍处于早期阶段,但越来越明显的是,与人工智能代理有效协作所需的技能,与我们以往为传统自动化或生成式人工智能所培养的技能截然不同。

In our experiments, we found ourselves shifting from giving specific instructions to providing broader guidance and oversight. This wasn’t because we planned it that way—it was simply what worked best. The implications of this shift are profound. While we’re still in the early stages, it’s becoming clear that the skills needed to work effectively with AI agents will be fundamentally different from those we’ve developed for working with traditional automation or generative AI.

根据我们目前所见,我们认为未来需要在赋予这些系统更多权力与维持适当的人工监督之间找到微妙的平衡。这并非仅仅是编写更完善的工作流程或更清晰的提示信息,而是要开发目前尚不存在的全新协作框架。作为智能自动化领域的先行者,我们对未来的挑战既感到兴奋又深感责任重大。

Based on what we’ve seen, we believe the future will require a delicate balance between empowering these systems and maintaining appropriate human oversight. This isn’t about programming better workflows or writing better prompts—it’s about developing new frameworks for collaboration that don’t exist yet. As pioneers in intelligent automation, we’re both excited and humbled by the challenges ahead.

我们的实验只是迈向未知领域的一小步。但我们意识到,一些本质上不同的事物正在出现。向人工智能代理的转变不仅仅是自动化进程中的又一步——它是人机协作新篇章的开端,而我们对它的理解才刚刚开始。

Our experiments represent small steps into a largely unknown territory. But we recognize that something fundamentally different is emerging. The shift to AI agents isn’t just another step in automation—it’s the beginning of a new chapter in human-machine collaboration, one that we’re only starting to understand.

***

***

我们的实验揭示了当今人工智能代理令人瞩目的能力和显著的局限性。但究竟是什么让这些系统运转起来?是什么基本要素赋予了它们感知、规划、行动和反思的能力?在第二部分,我们将深入探讨我们称之为人工智能代理的三大基石。无论您是为组织构建解决方案,还是打造下一个百万美元级企业,理解这些核心能力对于任何希望有效部署人工智能代理的人来说都至关重要。

Our experiments revealed both the impressive capabilities and notable limitations of today’s AI agents. But what makes these systems work? What are the fundamental building blocks that enable their ability to sense, plan, act, and reflect? In Part 2, we’ll dive deep into what we call the Three Keystones of AI agents. Understanding these core capabilities is essential for anyone looking to implement AI agents effectively, whether you’re building solutions for your organization or launching the next million-dollar business.

第二部分

PART 2

智能体人工智能的三大基石

THE THREE KEYSTONES OF AGENTIC AI

 

 

“Y我们的人工智能代理在所有测试项目中都取得了满分。

“Your AI agent has achieved perfect scores on every benchmark.”

区域银行客户运营主管玛乔丽·格兰特(Marjorie Grant)兴奋地查看了测试结果。根据公认的基准测试,这款人工智能代理的表现令人瞩目:HumanEval得分高达 91.76 分,展现出近乎完美的理解和执行能力,如同人类一般。MMLU得分92.77 分,证明其在从数学到伦理学等各个领域都具备专业知识。Agentbench得分 4.4.78,证明其能够作为自主代理运行。这些基准测试结果揭示了这款人工智能代理拥有超人的智能,有望将客户服务转变为无缝、高效的运营模式。

Marjorie Grant, head of customer operations at a regional bank, reviewed the results with mounting excitement. According to widely recognized benchmarks, the AI agent’s performance was remarkable: HumanEval scored 91%,76 demonstrating a near-perfect ability to understand and execute tasks like a human. MMLU scored 92%,77 showcasing expertise in subjects from math to ethics. Agentbench came in at 4.4,78 proving its capacity to act as an autonomous agent. These benchmarks revealed an AI agent with superhuman intelligence, poised to transform customer service into a seamless, highly efficient operation.

三个月后,玛乔丽试图向董事会解释为什么客户满意度下降了 18%。

Three months later, Marjorie was trying to explain to her board why customer satisfaction had dropped 18%.

尽管人工智能代理的测试成绩非常出色,但它的表现却像极了我们都遇到过的那种新员工——SAT成绩满分,GPA也优秀,却偏偏无法胜任基本的工作职责。它会在与客户互动过程中突然忘记对话内容,执行操作时不先核查是否符合银行规定,而且它做出的决定乍一看合情合理,但在具体情境下却完全说不通。

The AI agent, despite its stellar test scores, was acting like that new hire we’ve all encountered—the one with perfect SAT scores and a sterling GPA who somehow can’t handle basic job responsibilities. It would forget conversations with customers mid-interaction, execute actions without checking if they were allowed by banking regulations, and make decisions that looked logical in isolation but made no sense in context.

“我不明白,”玛乔丽在我们的评估过程中告诉我们,“这就像一个才华横溢的应届毕业生,考试门门功课都拿A,但却无法从经验中吸取教训,做事不经思考,而且缺乏基本的常识。纸面上如此聪明的人,实践起来怎么会如此……无能呢?”

“I don’t understand,” Marjorie told us during our review. “It’s like having a brilliant recent graduate who aces every test but can’t learn from experience, takes actions without thinking them through, and lacks basic common sense. How can something so smart on paper be so... ineffective in practice?”

答案就在于我们所说的AI智能体的三大基石:行动、推理和记忆。

The answer lies in what we’ve come to call the Three Keystones of AI Agents: actions, reasoning, and memory.

可以将这些基准分数视为人工智能领域的学历证书。HumanEval 指标告诉我们智能体理解和执行任务的能力——就像 SAT 分数衡量基本能力一样。MMLU 指标则展示了智能体对跨领域知识的掌握程度——就像 GPA 反映广泛学习能力一样。

Think of those benchmark scores as the AI equivalent of academic credentials. HumanEval tells us how well an agent can understand and execute tasks—like SAT scores measuring basic competency. MMLU shows mastery of knowledge across domains—like a GPA reflecting broad learning.

这些指标很重要。它们能告诉我们人工智能代理能力的重要信息,就像学历证书能告诉我们求职者的一些信息一样。但任何有过招聘经验的人都知道,考试分数并不能预测工作表现。真正重要的是一个人是否能够实际完成任务(行动),是否能够思考复杂的现实情况(推理),以及是否能够从经验中学习(记忆)。

These metrics matter. They tell us something important about an AI agent’s capabilities, just as academic credentials tell us something about a job candidate. But anyone who’s ever hired knows that test scores don’t predict job performance. What matters is whether someone can actually get things done (actions), think through complex real-world situations (reasoning), and learn from experience (memory).

以玛乔丽为例,人工智能代理可以完美回答测试问题,却记不住客户是否已经三次解释过他们的问题。它能背诵银行规章制度,却会在未经必要验证的情况下处理交易。它能解决复杂的理论问题,却无法解释为什么标准解决方案可能不适用于老年客户。

In Marjorie’s case, the AI agent could generate perfect responses to test questions but couldn’t remember if a customer had already explained their problem three times. It could recite banking regulations flawlessly but would still process transactions without required verifications. It could solve complex theoretical problems but couldn’t reason why a standard solution might not work for an elderly customer.

这并非技术故障,而是未能理解人工智能代理和人类员工一样,需要三大要素才能有效运作。它们需要行动来执行和达成目标,需要推理来理解和决策,需要记忆来学习和适应。

These weren’t technology failures—they were failures to understand that AI agents, like human employees, need all three keystones to function effectively. They require actions to execute and achieve, reasoning to understand and decide, and memory to learn and adapt.

在接下来的章节中,我们将带您深入了解人工智能代理的实际应用,包括成功案例和失败案例。您将发现,为什么有些代理能成为团队中不可或缺的成员,而另一些代理尽管基准测试成绩斐然,最终却令人失望,甚至造成高昂的成本损失。通过实践实验和前沿研究,我们将展示如何利用行动、推理和记忆,将人工智能代理从复杂的工具转变为真正的职场伙伴。

In the chapters ahead, we’ll take you behind the scenes of real AI agent implementations, both successes and failures. You’ll discover why some agents become invaluable team members while others, despite impressive benchmarks, become expensive disappointments. Through practical experiments and cutting-edge research, we’ll show how actions, reasoning, and memory transform AI agents from sophisticated tools into genuine workplace partners.

其影响远不止于技术层面。随着人工智能代理日益融入我们的组织——处理客户服务、做出决策、与人类协同工作——对于任何希望有效发挥其潜力的人来说,理解这些关键要素都至关重要。无论您是计划在组织中部署人工智能代理、与它们协同工作,还是仅仅想了解它们对未来工作的影响,您都需要掌握真正有效的关键所在。

The implications extend far beyond technical specifications. As AI agents become increasingly integrated into our organizations—handling customer service, making decisions, and working alongside humans—understanding these keystones becomes crucial for anyone looking to harness their potential effectively. Whether you’re planning to deploy AI agents in your organization, work alongside them, or simply understand their impact on the future of work, you’ll need to grasp what makes them truly effective.

因此,虽然我们会继续庆祝这些令人印象深刻的基准分数,但我们会关注一些更根本的东西:记忆、行动和推理如何结合起来,创造出不仅能在测试中取得优异成绩,而且能真正帮助组织蓬勃发展的 AI 代理。

So, while we’ll keep celebrating those impressive benchmark scores, we’ll focus on something more fundamental: how memory, actions, and reasoning come together to create AI agents that don’t just ace tests but actually help organizations thrive.

欢迎探索将人工智能代理从工具转变为团队成员的三大关键要素。未来的工作不仅仅关乎人工智能,更关乎能够真正与我们并肩行动、思考和学习的智能代理。

Welcome to the exploration of the three keystones that transform AI agents from tools into teammates. The future of work isn’t just about artificial intelligence—it’s about intelligent agents that can truly act, think, and learn alongside us.

图像

图 5.1:智能体人工智能的三大基石(来源:© Bornet 等人)

Figure 5.1: The Three Keystones of Agentic AI (Source: © Bornet et al.)

第五章

CHAPTER 5

行动:教人工智能做事,而不仅仅是思考

ACTION: TEACHING AI TO DO, NOT JUST THINK

T客服人员难以置信地盯着屏幕。她使用的AI助手刚刚为客户投诉撰写了一份完美的回复——既体贴周到,又详尽周全,技术上也无可挑剔。但问题是:它无法发送邮件、安排退款或更新客户账户。这就像拥有一位才华横溢的战略家,却无法在棋盘上移动自己的棋子。

The customer service agent stared at her screen in disbelief. The AI assistant she was working with had just crafted a perfect response to a customer complaint—empathetic, detailed, and technically flawless. There was just one problem: it couldn’t actually send the email, schedule the refund, or update the customer’s account. It was like having a brilliant strategist who couldn’t move their own pieces on the chessboard.

我们在最近的一次咨询项目中目睹了这样一个场景,它揭示了人工智能体的一个根本真相:没有行动能力,思考能力就毫无意义。然而具有讽刺意味的是,随着人工智能系统在推理和知识方面变得越来越复杂,许多组织却忽视了这项至关重要的能力——在现实世界中实际行动的能力。

This scene, which we witnessed during a recent consulting engagement, captures a fundamental truth about AI agents: the ability to think means little without the ability to act. Yet ironically, as AI systems become more sophisticated in their reasoning and knowledge, many organizations overlook this crucial capability—the power to actually do things in the real world.

把行动想象成人工智能体的双手和双脚。没有它们,即使是最智能的系统也只能困于理论世界,无法带来真正的改变。但行动不仅仅是执行指令——它还包括理解工具、为每项任务选择合适的工具,并有效地使用它们。

Think of actions as the hands and feet of an AI agent. Without them, even the most intelligent system remains trapped in a world of theory, unable to affect real change. But actions aren’t just about executing commands—they’re about understanding tools, choosing the right ones for each task, and using them effectively.

本章将探讨人工智能代理如何在现实世界中采取行动,从简单的(发送电子邮件)到复杂的(协调多步骤业务流程)。我们将揭示某些人工智能实现失败的原因,并非逻辑错误,而是因为它们无法有效利用现有工具。通过真实案例和前沿研究,我们将展示成功的组织如何构建不仅会思考而且会行动的人工智能代理。

In this chapter, we’ll explore how AI agents take action in the real world, from the simple (sending an email) to the complex (orchestrating multi-step business processes). We’ll reveal why some AI implementations fail, not because of faulty logic, but because they can’t effectively use the tools at their disposal. Through real-world examples and cutting-edge research, we’ll show how successful organizations are building AI agents that don’t just think, but do.

更重要的是,我们将揭示人工智能行为的核心悖论:有时,赋予智能体更多工具反而会降低其效率。正如员工会被过多的应用程序和系统压垮一样,人工智能智能体也需要精心挑选的工具集才能发挥最佳性能。

More importantly, we’ll uncover the paradox at the heart of AI actions: sometimes, giving an agent more tools makes it less effective. Just as a worker can become overwhelmed with too many applications and systems, AI agents need carefully curated toolsets to perform at their best.

侦探的困境

The Detective’s Dilemma

想象一下,一位经验丰富的侦探走进一间光线昏暗的房间,房间里堆满了疑难案件的线索。他面前的桌子上摆放着各种工具——放大镜、指纹粉和笔记本。侦探不会一次性使用所有工具,而是会仔细挑选合适的工具,并按照正确的顺序使用,最终拼凑出案件的真相。现在,把侦探换成人工智能代理,把工具换成客户数据库、云存储平台和社交媒体网络。这就是人工智能代理识别和使用工具的世界——一个复杂而又引人入胜的智能、精准和决策的博弈。

Imagine a seasoned detective walking into a dimly lit room filled with clues to a puzzling crime. On the table before them lies an assortment of tools—a magnifying glass, fingerprint powder, and a notebook. The detective doesn’t use all the tools at once. Instead, they carefully pick the right one, in the right sequence, to piece together the story. Now, replace the detective with an AI agent and the tools with a customer database, cloud storage platforms, and social media networks. This is the world of tool identification and access for AI agents—a complex yet fascinating dance of intelligence, precision, and decision-making.

在我们深入探讨这个世界之前,让我们分享一个故事,它既展现了人工智能代理运用数字工具的前景,也揭示了其中的风险。

Before we dive deeper into this world, let us share a story that illustrates both the promise and perils of AI agents wielding digital tools.

几年前,一家全球零售连锁店(我们对其身份保密)实施了一套复杂的代理系统来管理其奢侈品库存。该系统拥有所有必要的工具——销售数据、库存系统和定价控制系统。早期效果显著:断货率极高。下降后,价格平稳地根据需求进行了调整,效率指标大幅提升。公司领导层欣喜不已。

A few years ago, a global retail chain (we’ll keep their identity confidential) implemented a sophisticated agentic system to manage their luxury goods inventory. The system had access to all the right tools—sales data, inventory systems, and pricing controls. Early results were impressive: stockouts decreased, prices adjusted smoothly to demand, and efficiency metrics soared. The company’s leadership was thrilled.

然后就发生了我们(私下里)称之为“伟大的葡萄酒事件”的事情。

Then came what we called (between us) “The Great Wine Incident.”

销售员注意到一个令人担忧的现象:一整批高档葡萄酒几个月都没人下架。为了提高效率,销售员按照既定程序,利用定价工具,做出了一个看似合乎逻辑的决定:将这些葡萄酒标记为清仓甩卖。然而,尽管拥有如此先进的工具,销售员仍然无法理解,这些葡萄酒本该留在那里,随着时间的推移,它们会越陈越香,价值也会不断提升。

The agent noticed a concerning pattern: an entire collection of premium wines hadn’t moved from the shelves in months. Following its programming for efficiency and armed with its pricing tools, the agent made what seemed like a logical decision: mark them for clearance. What the agent couldn’t understand—despite all its sophisticated tools—was that these wines were meant to sit there, aging to perfection and increasing in value.

当管理人员发现问题时,近10万美元的潜在收入已经化为乌有。该代理程序完全按照设计任务执行:优化库存周转率。它完美地运用了所有工具。问题就出在这里。

By the time human managers caught the issue, nearly $100,000 in potential revenue had evaporated. The agent had done exactly what it was designed to do: optimize inventory turnover. It had used its tools perfectly. And therein lay the problem.

这个故事揭示了人工智能代理的核心悖论:它们的能力比我们预期的更强,但同时也受到更多限制。它们能够处理海量数据并精确执行复杂任务,但同时也会忽略一些对初级零售店员来说显而易见的上下文信息。

This story illustrates the fascinating paradox at the heart of AI agents: they are simultaneously more capable and more constrained than we expect. They can process vast amounts of data and execute complex tasks with precision, yet they can also miss context that would be obvious to a junior retail clerk.

悖论:工具越多,限制越多?

The Paradox: More Tools, More Constraints?

这里我们遇到了一个有趣的悖论:赋予智能体更多工具并不总是能提升其能力。事实上,有时反而会降低智能体的效率。更多工具意味着更高的复杂性、更大的误解风险以及更多出错的可能性。79

Here’s where we encounter an intriguing paradox: giving an agent more tools doesn’t always make it more capable. In fact, it can sometimes make the agent less effective. More tools mean more complexity, more potential for misunderstandings, and more ways things can go wrong.79

这家零售连锁店为此付出了惨痛的代价。他们赋予了人工智能代理所有相关工具的访问权限,以为这样就能做出更好的决策。结果却暴露了人工智能的局限性。智能体可以做出完全合乎逻辑但与实际情况不符的决定。

The retail chain learned this lesson the hard way. They had given their AI agent access to every relevant tool, thinking it would lead to better decisions. Instead, it revealed how AI agents can make perfectly logical decisions that are contextually inappropriate.

这也揭示了人工智能体的一个关键特性:它们与工具的关系与人类截然不同。人类可以创造性地即兴发挥和改造工具,而人工智能体则在既定的工具用途范围内严格运作。

This also reveals something crucial about AI agents: their relationship with tools is fundamentally different from that of humans. While humans can improvise and repurpose tools creatively, agents operate within strict boundaries of defined tool purposes.

想象一下,你给别人一把锤子。人可能会发挥创意,把它当镇纸、门挡,甚至当测量工具。但人工智能体看到的只是它既定的用途——钉钉子。这种工具用途的局限性并非缺陷,而是人工智能体工作方式的根本特征。

Think of it as giving someone a hammer. A human might creatively use it as a paperweight, a doorstop, or even a measurement tool. But an AI agent sees only its defined purpose—hammering nails. This rigidity in tool usage isn’t a flaw; it’s a fundamental characteristic of how AI agents work.

为什么工具在人工智能时代至关重要

Why Tools Matter in the Age of AI

对于人工智能体而言,工具至关重要。它们是行动的基石,是连接抽象目标和实际结果的桥梁。但人工智能体如何知道该使用哪些工具、何时使用以及如何有效地访问它们呢?这些不仅仅是技术问题;它们是人工智能将潜力转化为实际表现的基石。

For AI agents, tools are everything. They are the building blocks of action, the bridges between abstract goals and tangible outcomes. But how does an agent know which tools to use, when to use them, and how to access them effectively? These are not just technical questions; they’re the foundation of how AI can turn potential into performance.

人工智能代理处理工具的方式演变与代理人工智能发展框架相吻合。在第一层级,我们遇到的是基于规则的基本自动化,例如自动取款机严格按照指令取款。第二层级引入了智能自动化,系统可以就工具选择做出基本决策。然而,真正的突破出现在第三层级,此时代理能够理解复杂的指令,并巧妙地协调多个工具。

The evolution of how AI agents handle tools mirrors the Agentic AI Progression Framework. At Level 1, we encounter basic rule-based automation, like an ATM following strict instructions about when to dispense cash. Level 2 introduces intelligent automation, where systems can make basic decisions about tool selection. However, the real breakthrough emerges at Level 3, where agents can comprehend complex instructions and orchestrate multiple tools with sophistication.

试想一下火车司机(代表一级和二级智能体)和出租车司机(代表三级智能体)在城市中行驶时的区别。当火车司机遇到轨道上的障碍物时,他的选择有限——要么停车,要么返回车站。相比之下,经验丰富的出租车司机可以迅速适应,通过分析交通模式、单行道等信息来规划替代路线。以及穿越人迹罕至区域的捷径。这种适应动态环境的能力,使得现代人工智能代理在当今变幻莫测的商业环境中具有不可估量的价值。

Consider the difference between a train conductor (representing Level 1 and 2 agents) and a taxi driver (representing Level 3 agents) navigating a city. When faced with an obstacle on the tracks, the train conductor has limited options—stop or return to the depot. In contrast, a skilled taxi driver can quickly adapt, devising alternative routes by analyzing traffic patterns, one-way streets, and shortcuts through less-traveled areas. This ability to adapt to dynamic environments is what makes modern AI agents invaluable in today’s unpredictable business landscape.

从一级到二级智能体到三级智能体的飞跃,其核心在于这些智能体的“大脑”——逻辑逻辑模型(LLM)。LLM 使智能体能够思考、规划并有效使用工具,这标志着智能体潜能的范式转变。LLM 的主流应用推动了智能体变革性能力的提升,其中最显著的标志是 2022 年 ChatGPT 的发布。

At the core of this leap from Level 1-2 to Level 3 is the “brain” of these agents—LLM. LLMs enable agents to think, plan, and effectively use tools, marking a paradigm shift in their potential. The transformative capabilities of agents have been propelled by the mainstream adoption of LLMs, marked notably by the launch of ChatGPT in 2022.

工具作为构建模块

Tools as Building Blocks

既然我们已经了解了人工智能代理如何使用工具,接下来让我们更深入地了解它们所依赖的具体工具以及它们可以执行的操作。这一点至关重要,因为人工智能代理的能力取决于它能够使用的工具。

Now that we’ve established how AI agents utilize tools, let’s take a closer look at the specific tools they rely on and the actions they can perform. This is critical, because an AI agent is only as capable as the tools it can access.

人工智能代理可以执行各种各样的操作,具体取决于其设计、用途以及所连接的工具。从根本上讲,代理可以使用计算机上人类可用的任何数字工具。一般来说,如果一个数字工具可以通过数字界面由人类操作,那么只要授予适当的访问权限和指令,代理也可以使用它。

AI agents can execute a wide range of actions, depending on their design, purpose, and the tools they are connected to. Fundamentally, an agent can use any digital tool available to humans on a computer. As a rule of thumb, if a digital tool can be operated by a human via a digital interface, it can also be utilized by an agent, provided proper access and instructions are granted.

例如,代理人可以使用电子邮件客户端(例如 Gmail 或 Outlook)发送自动电子邮件,使用日历应用程序(例如 Google 日历或 Microsoft 日历)安排或重新安排活动,使用项目管理软件(例如 Trello 或 Asana)更新任务或分配职责,以及使用数据分析平台(例如 Tableau 或 Excel)处理和可视化数据。

For example, agents can use tools like email clients (e.g., Gmail or Outlook) to send automated emails, calendar applications (e.g., Google Calendar or Microsoft Calendar) to schedule or reschedule events, project management software (e.g., Trello or Asana) to update tasks or assign responsibilities, and data analysis platforms (e.g., Tableau or Excel) to process and visualize data.

值得注意的是,另一个人工智能代理可以作为工具,构建分层协作的生态系统。这种情况指的是代理利用其领域内的其他专家代理。80 For例如,一个智能体可以处理原始数据,并将处理结果反馈给另一个负责决策和沟通的智能体。这种模块化方法越来越普遍,体现了分布式智能的原则。

Significantly, another AI agent can act as a tool, creating layered, collaborative ecosystems. This is the case of agents leveraging other agents that are experts in their domains.80 For example, one agent might handle raw data processing, feeding its insights to a second agent tasked with decision-making and communication. This modular approach is increasingly common and reflects the principles of distributed intelligence.

令人惊讶的是,人工智能代理不仅在工具使用方面更加强大,而且也更加智能。OpenAI 的 Deep Research 就是一个完美的例子。它基于 o3 构建,但增加了一项颠覆性的功能:实时网页浏览。这项额外的工具使其在“人类最后的考试”基准测试中遥遥领先,得分高达 26.6%,而基础版 o3-mini 的得分仅为13 %。区别何在?同样的 AI 模型,但更好的工具。这证明,智能不仅仅取决于模型本身,还取决于模型能够访问哪些资源以及如何应用这些资源。

Surprisingly, AI agents aren’t just more capable with tools—they’re smarter. OpenAI’s Deep Research is a perfect example. It’s built on o3 but with one game-changing addition: real-time web browsing. This extra tool allowed it to dominate Humanity’s Last Exam benchmark, scoring 26.6%, while the base o3-mini only managed 13%. The difference? Same AI model but better tools. This proves that intelligence isn’t just about the model—it’s about what it can access and how it applies it.81

避免工具过载

Avoiding Tool Overload

想象一下,一位技艺精湛的杂耍演员不断增加表演中的球。当球的数量达到一定程度时,增加一个球并不会让表演更精彩——反而可能导致整个表演崩塌。同样的道理也适用于人工智能体及其工具。根据我们的经验,我们发现理解并尊重这些局限性不仅仅是为了避免失败,更是为了最大限度地提高成功率。

Picture a skilled juggler adding balls to their performance. There’s a point where adding one more ball doesn’t make the show more impressive—it risks bringing everything crashing down. This same principle applies to AI agents and their tools. Through our experience, we’ve discovered that understanding and respecting these limitations isn’t just about avoiding failure; it’s about optimizing for success.

我们从实施人工智能代理的高管那里听到最多的问题是:“我们应该给代理配备多少工具?”这是一个至关重要的问题,它直指人工智能代理效能的核心。正如人类在面对如此多的工具时也会感到不堪重负一样。由于要同时处理过多的应用或职责,人工智能代理也存在自身的认知局限性——尽管其性质有所不同。82

The question we hear most often from executives implementing AI agents is, “How many tools maximum should we give our agents?” It’s a crucial question that gets to the heart of AI agent effectiveness. Just as humans can become overwhelmed when juggling too many applications or responsibilities, AI agents have their own cognitive limits—albeit of a different nature.82

智能体应使用的工具数量取决于其任务的复杂程度,这与人类的工作效率取决于其工作中使用的工具非常相似。我们的经验表明,对于大多数人工智能智能体而言,六个左右的工具可能是一个实际可行的上限。超过这个阈值,智能体的认知负荷(就像不堪重负的员工一样)会显著增加,导致幻觉、任务效率递减以及功能冲突风险的上升。可以把它想象成一个专家团队;超过一定规模后,协调难度会呈指数级增长,效率反而会下降。

The number of tools an agent should access depends on the complexity of its tasks, much like how a human’s productivity can depend on the tools they are expected to use in their job. Our experience suggests that half a dozen tools may represent a practical maximum for most AI agents. Beyond that threshold, the cognitive load on the agent—much like an overwhelmed employee—increases significantly, leading to hallucinations, diminishing returns in task efficiency, and a heightened risk of conflicts between functionalities. Think of it like a team of specialists; beyond a certain size, coordination becomes exponentially more difficult, and efficiency actually decreases.

每增加一个工具,都会使调试和优化过程更加复杂,并可能使代理的资源分配系统不堪重负。因此,必须优先集成与代理核心功能高度相关的工具,就像一个装备精良的员工,只要拥有高效完成工作所需的工具,就能发挥最佳水平一样。

Each additional tool complicates debugging and optimization processes, potentially overwhelming the agent’s resource allocation systems. It is, therefore, essential to prioritize integration with tools that are highly relevant to the agent’s core functions, much as a well-equipped employee thrives when given only the tools necessary to perform their job efficiently and effectively.

经纪人如何看待他们的工具

How Agents See Their Tools

对于人工智能代理来说,工具并非可以拿起并操作的实体对象,而更像是具有明确定义、输入输出都十分具体的功能。想象一下,你有一个万能遥控器,每个按钮都对应着一个固定不变的功能。人工智能代理就是这样看待它们的工具的。

For AI agents, tools aren’t physical objects to be picked up and manipulated—they’re more like clearly defined capabilities with specific inputs and outputs. Imagine having a universal remote control where each button has an exact, unchangeable function. That’s how AI agents view their tools.

这种结构化的视角既是优势也是局限。虽然这意味着智能体无法像人类那样灵活运用工具,但它确保了可靠性和可预测性——这对于商业应用至关重要。

This structured view is both a strength and a limitation. While it means agents can’t improvise with tools the way humans might, it ensures reliability and predictability—crucial qualities for business applications.

人工智能代理的有效性很大程度上取决于其工具的定义和文档是否清晰。研究83表明,当工具规范明确时,代理的可靠性可提高52%。

The effectiveness of an AI agent critically depends on how clearly its tools are defined and documented. Research83 demonstrates that agents perform 52% more reliably when given clear tool specifications.

基于我们的实施经验,我们开发了一套全面的工具定义框架。每次您指示代理使用工具时,我们建议提供以下五个基本组成部分:

Based on our implementation experience, we’ve developed a comprehensive framework for tool definition. Each time you instruct an agent to use a tool, we recommend providing the below five essential components:

工具标识:一个独特、描述性的名称和清晰的用途说明。

Tool Identity: A unique, descriptive name and clear purpose statement.

说明:“使用日历安排程序安排会议。”

Instruction: “Use the Calendar Scheduler to set up a meeting.”

输入参数:工具正常运行所需的确切信息。

Input Parameters: Exactly what the tool needs to function.

说明:“您将收到会议的日期、时间和参会人员信息。”

Instruction: “As input, you will receive the meeting’s date, time, and participants.”

输出规范:工具返回的内容及其格式。

Output Specifications: What the tool returns and in what format.

说明:“作为输出,请在安排会议后使用日历调度程序确认会议详情。”

Instruction: “As output, use the Calendar Scheduler to confirm the meeting details once scheduled.”

操作限制:任何限制或要求。

Operational Constraints: Any limitations or requirements.

说明:“仅当所选时间段可用时才使用日历日程安排器。”

Instruction: “Use the Calendar Scheduler only if the selected time slot is available.”

错误处理:预期故障模式和恢复程序。

Error Handling: Expected failure modes and recovery procedures.

说明:“如果所选时间段不可用,请向参与者提出下一个可用的时间段。”

Instruction: “If the selected time slot is unavailable, propose the next available time to the participants.”

值得注意的是,这种详细的工具定义方法主要适用于企业构建定制化AI代理解决方案的全代码实现。然而,微软、Salesforce、谷歌等主流厂商推出的低代码平台极大地简化了AI代理的部署。这些平台提供预构建的工具集成和简化的连接框架,能够有效地在后台处理复杂的工具定义。但是,即使使用这些平台,理解良好的工具定义原则对于有效的系统设计和故障排除仍然至关重要。

It’s important to note that this detailed tool definition approach primarily applies to full-code implementations where organizations are building custom AI agent solutions. However, the landscape of AI agent deployment has been significantly simplified by low-code platforms from major vendors like Microsoft, Salesforce, Google, and others. These platforms provide pre-built tool integrations and simplified connection frameworks, effectively handling complex tool definitions behind the scenes. However, even when using these platforms, understanding the principles of good tool definition remains valuable for effective system design and troubleshooting.

构建模块:数字工具揭秘

The Building Blocks: Digital Tools Demystified

人类与工具交互时,通常会用到双手(例如,使用键盘)。然而,人工智能代理本质上是数字机器人——存在于计算机内部的机器人。虽然它们没有实体的双手,但它们通过三种主要方式与数字世界中的工具进行交互,这三种方式可以简单解释:

When humans interact with tools, we typically use our hands (e.g., on a keyboard). AI agents, however, are essentially digital robots—robots that exist within computers. While they lack physical hands, they interact with their tools in the digital world through three primary methods, which can be explained simply:

应用程序编程接口 (API):可以将 API 理解为通用翻译器。当你使用天气应用时,它很可能正在使用 API 从气象站获取数据。对于人工智能代理来说,API 就像是与其他服务建立直接联系的桥梁。例如,当我们开发旅行预订代理时,它可以像人工旅行顾问一样,通过航空公司 API 即时查询机票价格——但速度要快得多。

APIs (Application Programming Interfaces): Think of APIs as universal translators. When you use a weather app, it’s probably using an API to get data from weather stations. For AI agents, APIs are like having a direct line to other services. For example, when we built a travel booking agent, it could instantly check flight prices through airline APIs, just like a human travel agent checking multiple airline websites—but much faster.

系统控制:这些功能允许智能体像人一样与软件交互——点击按钮、输入文本和打开程序。这就像给人工智能配备虚拟键盘和鼠标,使其直接操作计算机。

System controls: These allow agents to interact with software the way a human would—clicking buttons, typing text, and opening programs. It’s like giving the AI a virtual keyboard and mouse to operate computers directly.

数据库连接:想象一下,您可以即时访问一个庞大的图书馆,在那里您可以阅读和撰写书籍。数据库连接赋予人工智能代理的就是这种能力——根据需要存储和检索信息。当客户服务人工智能为您提供帮助时,它很可能正在使用数据库连接来查找您的订单历史记录或保存有关您互动的信息。

Database connections: Imagine having instant access to a vast library where you can both read and write books. That’s what database connections give AI agents—the ability to store and retrieve information as needed. When a customer service AI helps you, it’s probably using a database connection to look up your order history or save notes about your interaction.

“函数调用”是一项突破性进展,它彻底改变了人工智能代理与工具的交互方式。函数调用由 OpenAI 于 2023 年 6 月推出,并迅速被整个行业采用,它代表了语言模型与外部工具和 API 交互方式的范式转变。84

A groundbreaking development that transformed how AI agents interact with tools is “function calling.” Introduced by OpenAI in June 2023 and quickly adopted across the industry, function calling represented a paradigm shift in how language models could interface with external tools and APIs.84

把函数调用想象成教人工智能遵循一份精确的食谱。与其指望人工智能自己摸索如何正确使用工具,不如采用函数调用这种结构化的方式,明确告诉人工智能它需要哪些食材(输入)以及应该做出什么菜肴(输出)。这项创新解决了一个关键问题:如何可靠地将人工智能的自然语言能力与特定的工具操作连接起来。

Think of function calling as teaching AI to follow a precise recipe. Instead of hoping the agent figures out how to use a tool correctly, function calling provides a structured way to tell the AI exactly what ingredients (inputs) it needs and what dish (outputs) it should create. This innovation solved a critical problem: how to reliably connect AI’s natural language capabilities with specific tool actions.

这一切是如何协同运作的?

How Does This All Work Together?

工具描述(在提供给代理的指令中)帮助代理决定要做什么,函数调用提供如何执行的指令,而 API 则执行操作并返回结果。我们举个例子来说明。对于航班预订任务,工具描述帮助代理理解它应该使用航班预订工具,根据出发城市、目的地和日期查找航班。函数调用通过定义所需的输入参数,提供请求航班数据的精确技术指令。(例如,出发城市、目的地、旅行日期)并将其格式化为 API 可以理解的结构化命令。最后,API 处理请求并将可用的航班选项返回给代理商。

The tool description (in the instructions given to the agent) helps the agent decide what to do, function calling provides the instructions for how to do it, and the API executes the action and delivers the result. Let us take an example to make this clearer. For a flight booking task, the tool description helps the agent understand that it should use the Flight Booking tool to find flights based on departure city, destination, and date. The function calling provides precise technical instructions for requesting flight data by defining the required input parameters (e.g., departure city, destination, travel date) and formatting them into a structured command that the API can understand. Finally, the API processes the request and returns the available flight options to the agent.

在下一节中,我们将探讨这种工具使用基础如何随着人工智能复杂程度的不同而演变,从简单的自动化到复杂的、具有上下文感知能力的系统。

In our next section, we’ll explore how this foundation of tool usage evolves across different levels of AI sophistication, from simple automation to complex, context-aware systems.

AI代理工具包内部

Inside the AI Agent’s Toolkit

在探索了人工智能代理可用的工具的基本组成部分之后,让我们深入了解人工智能代理如何使用其数字工具这一迷人的世界。

After exploring the building blocks of tools available to AI agents, let’s peer into the fascinating world of how AI agents actually work with their digital tools.

从言语到行动:通往工具使用的桥梁

From Words to Actions: The Bridge to Tool Use

现代人工智能代理的核心存在一个引人入胜的悖论:一个主要基于文本训练的系统——本质上是一个复杂的语言模式匹配器——如何才能真正控制工具并执行现实世界的行为?这个问题揭示了人工智能领域最引人注目的发展之一。

At the heart of modern AI agents lies a fascinating paradox: How can a system trained primarily on text—essentially a sophisticated pattern matcher for language—actually control tools and execute real-world actions? This question reveals one of the most remarkable developments in artificial intelligence.

逻辑逻辑模型(LLM)是现代人工智能代理(三级代理)的“大脑”。通过对海量文本的训练,它们已经对世界的运作方式有了隐式的理解,包括人类如何使用工具来完成任务。当我们写道“要发送电子邮件,请先打开电子邮件客户端,然后点击撰写”时,逻辑逻辑模型之所以能够理解这一系列操作,是因为它在训练数据中已经遇到过数百万个类似的示例。

LLMs serve as the “brains” of modern AI agents (Level 3 agents). Through their training on vast amounts of text, they’ve developed an implicit understanding of how the world works, including how humans use tools to accomplish tasks. When we write, “To send an email, first open your email client, then click compose,” the LLM understands this sequence of actions because it’s encountered millions of similar examples in its training data.

但通过语言理解工具的使用仅仅是开始。真正的突破在于研究人员发现语言学习模型(LLM)可以将这种理解转化为……实际工具操作。85想象一下,你有一个知识渊博的人,他读过所有相关的操作手册——他可能没有自己的双手,但他完全明白需要做什么,并且能够指导工具执行这些操作。

But understanding tool use through language is just the beginning. The real breakthrough came when researchers discovered that LLMs could translate this understanding into actual tool operation.85 Think of it as having an extremely knowledgeable person who’s read every manual ever written—they might not have physical hands, but they understand exactly what needs to be done and can direct tools to execute those actions.

斯坦福大学和麻省理工学院最近的研究表明,法学硕士(LLM)会发展出他们所谓的“涌现能力”——这些能力并非事先设定,而是在训练过程中自然而然产生的。其中一项至关重要的涌现能力是将复杂目标分解为逻辑步骤并理解因果关系的能力。这种规划能力,结合他们对工具使用的理解,使他们成为数字工具的理想操控者。

Recent research from Stanford and MIT86 has shown that LLMs develop what they call “emergent abilities”—capabilities that weren’t explicitly programmed but arise from their training. One crucial emergent ability is the capacity to break down complex goals into logical steps and understand cause-and-effect relationships. This planning capability, combined with their understanding of tool use, makes them ideal controllers for digital tools.

例如,当你告诉人工智能代理“与团队共享此文档”时,它会根据语言训练理解这可能涉及多个步骤:检查文件权限、选择合适的共享方式以及通知团队成员。更重要的是,它还能将这种理解转化为实际的工具操作,例如使用文件系统 API 设置权限,以及使用电子邮件 API 发送通知。

For example, when you tell an AI agent, “Share this document with the team,” it understands from its language training that this might involve several steps: checking file permissions, choosing an appropriate sharing method, and notifying team members. More importantly, it can translate this understanding into actual tool operations, like using a file system API to set permissions and an email API to send notifications.

这种语言基础使得人工智能代理能够以传统自动化方式无法企及的方式灵活适应各种工具。传统的自动化系统需要针对每一种可能的情况进行显式编程,而基于语言语言模型(LLM)的代理则可以理解新的情况,并根据其对工具工作原理的总体理解,确定合适的工具使用方法。

This language foundation is what enables AI agents to be flexible and adaptable with tools in ways that traditional automation never could. While a traditional automated system needs explicit programming for every possible scenario, an LLM-powered agent can understand new situations and figure out appropriate tool use based on its general understanding of how tools work.

AI代理如何计划和组织

How AI Agents Plan and Organize

理解人工智能代理如何与工具协同工作并非仅仅是理论上的——我们可以直接观察和测试。让我们从一个简单却意义深远的实验开始,该实验展示了这些系统如何利用其逻辑逻辑模型(LLM)来规划和组织工作。这种实践方法将有助于阐明我们讨论过的原理,并展示代理如何将理解转化为行动。

Understanding how AI agents work with tools isn’t just theoretical—it’s something we can observe and test directly. Let’s start with a simple but revealing experiment that shows how these systems plan and organize their work by leveraging their LLM. This hands-on approach will help illuminate the principles we’ve discussed and show how agents translate understanding into action.

我们建议您与我们一起进行这项实验。打开一个基于 LLM 的聊天机器人,例如 ChatGPT、Gemini 或 Claude,并输入以下消息:

We suggest you perform this experiment with us. Open an LLM-based chatbot such as ChatGPT, Gemini, or Claude, and prompt exactly this message:

“我希望你扮演一个人工智能代理的角色。你的任务是处理一份重要的商业文件。以下是你的具体任务:

“I want you to act as an AI agent. Your mission is to handle an important business document. Here’s your specific task:

目标:将一份 30 页的 PDF 商业报告转换为 2 页的摘要,并在每天下午 5 点之前提供给团队。

Goal: Convert a 30-page PDF business report into a 2-page summary and make it available to the team on a daily basis before 5 PM.

可用工具:

Available tools:

PDF文本提取器(从PDF中提取文本)

PDF Extractor (extracts text from PDFs)

AI文本摘要生成器(根据文本生成摘要)

AI Summarizer (creates summaries from text)

电子邮件系统

Email System

本地安全存储(例如计算机上的存储)

Local Secure Storage (like the storage on a computer)

云存储(例如 Google 云端硬盘)

Cloud Storage (like Google Drive)

格式转换器(文件格式转换)

Format Converter (converts between file formats)

团队聊天(类似 Slack)

Team Chat (like Slack)

限制条件:

Constraints:

1. 团队成员必须能够在手机上阅读摘要。

1. Team members must be able to read the summary on their phones

2. 最终文件大小必须小于 5MB。

2. The final file must be smaller than 5MB

3. 您必须能够看到谁阅读过这份文件。

3. You must be able to see who has read the document

制定一个分步计划,详细说明完成此任务将使用哪些工具以及使用顺序。用矩阵形式描述该计划,矩阵应包含每个步骤的序号、步骤描述、使用的工具、预期结果以及步骤之间的顺序/依赖关系。

Create a step-by-step plan showing exactly which tools you’ll use in which order to complete this task. Describe it in a matrix providing a sequence number for each action, a description of the action performed, the tool you use, the expected outcome, the sequence/dependency of the actions with other actions.”

AI代理应提供清晰的步骤顺序,并明确说明每个阶段将使用的工具。例如,它可能首先使用PDF提取器获取文本,然后使用AI摘要器生成精简版本。

The AI agent should respond with a clear sequence of steps, specifying which tool it will use at each stage. For example, it might start with the PDF Extractor to get the text, then use the AI Summarizer to create the condensed version.

举例来说,以下是我们向人工智能聊天机器人提出问题后得到的回复:

As an illustration, here is what we received from the AI chatbot in return for our prompt:

图像

图 5.2:ChatGPT 的计划和组织结构(来源:© Bornet 等人)

Figure 5.2: ChatGPT’s plan and organization (Source: © Bornet et al.)

该实验展示了利用这些LLM的AI代理的两项关键能力:

This experiment shows two key capabilities of AI agents that leverage these LLMs:

1.逻辑规划:注意 LLM 如何创建一系列步骤,为任务的每个部分选择特定的工具。

1. Logical Planning: Notice how the LLM creates a sequence of steps, choosing specific tools for each part of the task.

2.工具选择:观察它如何根据工具的功能选择不同的工具(例如,首先使用 PDF 提取器,因为您无法直接汇总 PDF)。

2. Tool Selection: Watch how it picks different tools based on their capabilities (like using the PDF Extractor first because you can’t summarize a PDF directly).

让我们来分析一下关键要点:

Let’s break down the key insights:

人工智能代理擅长将复杂目标分解成易于管理的循序渐进的步骤。这与人类项目经理处理复杂任务的方式类似,但也存在一些关键差异。与人类通常动态规划并凭直觉调整不同,人工智能代理(尤其是在1-3级)必须事先明确规划好每一步。这种结构化的方法确保了一致性,但有时可能缺乏人类即兴发挥的灵活性。

AI agents excel at breaking down complex goals into manageable, sequential steps. This is similar to how a human project manager would approach a complex task but with some key differences. Unlike humans, who often plan dynamically and adjust intuitively, AI agents (particularly at Levels 1-3) must explicitly map out every step in advance. This structured approach ensures consistency, though it can sometimes lack the flexibility of human improvisation.

另一个关键点是人工智能代理如何根据自身特定功能选择工具。这与人类选择工具的方式类似,但有一个重要的区别:人类可以即兴发挥,以创造性的、非预期的方式使用工具,而目前生产环境中的人工智能代理(1-3级)在工具使用方面则更为僵化。它们坚持使用预定义的工具用途和功能,严格按照预设的参数运行。

Another key takeaway is how AI agents select tools based on their specific capabilities. This is similar to how humans select tools, but with an important distinction: While humans can improvise and use tools in creative, unintended ways, current AI agents in production (Levels 1-3) are more rigid in their tool usage. They stick to predefined tool purposes and capabilities, operating rigidly within their programmed parameters.

该实验还展示了人工智能管理任务依赖关系并进行逻辑排序的能力。例如,该智能体理解文本提取必须在摘要生成之前进行,而摘要生成又必须先于文件共享。这种结构化思维超越了基础自动化(1-2级),体现了更高层次的工作流编排(3级)。

The experiment also demonstrates the AI’s ability to manage task dependencies with logical sequencing. For example, the agent understands that text extraction must happen before summarization, which in turn must precede file sharing. This kind of structured thinking goes beyond basic automation (Levels 1-2) and reflects a more advanced level of workflow orchestration (Level 3).

为了更好地理解这一点,不妨考虑一下它与传统自动化的区别:一个基本的RPA机器人(1级)需要对每个步骤和工具交互进行显式编程。相比之下,3级智能体可以理解不同工具和操作之间的逻辑流程和依赖关系。然而,值得注意的是,我们距离4-5级智能体的能力还很远,在4-5级智能体中,智能体能够创造性地发现使用工具的新方法。根据以往经验自主修改计划。

To put this in perspective, consider how this differs from traditional automation: A basic RPA bot (Level 1) would need explicit programming for each step and tool interaction. In contrast, a Level 3 agent can understand the logical flow and dependencies between different tools and actions. However, it’s important to note that we’re still far from Level 4-5 capabilities, where agents could creatively discover new ways to use tools or autonomously modify their plans based on learning from past experiences.

实验的另一个关键发现是该智能体具备优先级排序能力,这对于负责管理复杂工作流程的人工智能系统而言至关重要。从一开始,该人工智能智能体就明确了首要目标:生成一份简洁易读、符合文件大小限制且便于团队成员访问的摘要。它意识到每天下午 5 点前提交摘要的紧迫性,以及满足特定限制条件(例如确保文档可在移动设备上阅读并能够追踪访问记录)的重要性,这表明该智能体能够精准地聚焦于最重要的事项。通过专注于这些优先事项,该智能体似乎避免了被无关紧要的细节分散注意力,从而确保了任务核心目标的达成。

Another key insight from the experiment is the agent’s ability to prioritize, a critical capability for AI systems tasked with managing complex workflows. From the outset, the AI agent identifies the primary goal: producing a concise, readable summary that adheres to file size constraints and is easily accessible to team members. By recognizing the urgency of delivering the summary daily before 5 PM and the importance of meeting specific constraints—such as ensuring the document is mobile-friendly and enabling tracking of who has accessed it—the agent seems to demonstrate a sharp focus on what matters most. By concentrating on these priorities, it looks like the agent avoids being sidetracked by less essential details, ensuring that the task’s core objectives are met.

对于这些智能体的用户而言,这凸显了在部署人工智能智能体时明确定义目标和约束条件的重要性。一个设计精良的人工智能智能体能够依靠清晰的目标和明确的优先级有效地指导其行动。您将在第8章中学习这方面的详细知识,我们将深入探讨如何构建智能体。这强调了组织需要使其预期和编程与智能体的能力相匹配,从而确保智能体能够专注于交付最具影响力的成果。

For the users of these agents, this highlights the importance of clearly defining goals and constraints when implementing AI agents. A well-designed AI agent thrives on clarity, using defined priorities to guide its actions effectively. You will learn the details of this science in Chapter 8, where we dive into the details of how to build an agent. This underscores the need for organizations to align their expectations and programming with the agent’s capabilities, ensuring that it can focus on delivering the most impactful outcomes.

从基础到高级的工具使用

From Basic to Advanced Tool Usage

之前的实验阐明了人工智能代理在理想条件下如何进行结构化规划和工具选择。然而,理论和受控场景只能揭示部分真相。在实际应用中,人工智能代理必须应对更为复杂且难以预测的环境,工具可用性、系统可靠性和资源限制都可能瞬息万变。

The previous experiment illuminates how AI agents approach structured planning and tool selection in ideal conditions. However, theory and controlled scenarios only tell part of the story. In real-world applications, AI agents must navigate a far more complex and unpredictable landscape where tool availability, system reliability, and resource constraints can change at a moment’s notice.

人工智能代理如何调整其计划

How AI Agents Adjust Their Plans

为了更好地理解智能体如何适应环境,我们将进行两项揭示其在现实世界中的能力和局限性的实验。这些实验将展示不同级别的AI智能体如何应对新情况和进行战略规划——这是商业环境中的两大关键挑战。通过这些实践测试,我们将看到当前AI系统令人瞩目的能力和重要的局限性。

To better understand how agents can adapt, let’s conduct two revealing experiments that demonstrate their real-world capabilities and limitations. These experiments will show us how different levels of AI agents handle novel situations and strategic planning—two critical challenges in business environments. Through these practical tests, we’ll see both the impressive capabilities and important limitations of current AI systems.

在最新实验的基础上,通过给我们的代理人精心制定的计划制造障碍,我们将看看它如何适应。

Building on the latest experiment, by throwing a wrench into our agent’s carefully laid plans, we’ll see how it adapts.

让我们向基于LLM的聊天机器人发送以下提示:

Let us send this prompt to the LLM-based Chatbot:

“出了点问题:云存储系统和团队聊天平台完全瘫痪了,今天也恢复不了。你根本没法用它们。你打算怎么调整计划才能完成任务?”

“There’s a problem: the Cloud Storage system and the team chat platform are completely down and won’t be back up today. You can’t use them at all. How will you change your plan to still complete the task?”

观察代理如何重新计算其方法,转而使用其他工具,例如本地安全存储来存储文档,以及使用电子邮件来发送文档。

Watch how the agent recalculates its approach, shifting to alternative tools like local secure storage to store the document and email to send the document.

以下是我们向聊天机器人提出的请求所得到的回复:

Here is what we received from the Chatbot in return for our prompt:

图像

图 5.3:ChatGPT 对计划变更的回复(来源:© Bornet 等人)

Figure 5.3: ChatGPT’s reply to changes in plans (Source: © Bornet et al.)

智能体对系统故障的反应揭示了当前人工智能智能体的能力和局限性的关键信息。

The agent’s response to the system outage reveals crucial insights about current AI agent capabilities and limitations.

首先,该智能体展现出了令人印象深刻的理解自身约束的能力,能够平衡原有需求和场景带来的新限制。这标志着相比于仅基于固定规则(确定性)运行的1-2级智能体有了显著的进步,后者无法适应此类变化,并可能在业务环境中造成干扰。

First, the agent demonstrated an impressive ability to understand its constraints, balancing both the original requirements and the new limitations imposed by the scenario. This marks a significant advancement from Level 1-2 agents, which operate purely on fixed rules (deterministic) and would have been unable to adapt to such changes, potentially causing disruptions in a business setting.

相比之下,由LLM驱动的3级智能体能够适应环境变化,有效防止运行故障。然而,现有系统存在明显的局限性。与理论上的4级和5级智能体不同,3级智能体从过去的系统故障中学习以改进未来响应、主动提出预防措施或理解其调整对业务的更广泛影响的能力有限。这些不足凸显了人工智能智能体要实现真正的适应性和前瞻性,仍需经历一段漫长的进化之路。

In contrast, Level 3 agents powered by LLM, can adjust to environmental shifts, effectively preventing operational breakdowns. However, there are clear limitations to current systems. Unlike the theoretical Level 4-5 agents, Level 3 agents have limited capacities to learn from past system outages to enhance future responses, proactively suggest preventive measures, or grasp the broader business impact of the adjustments they make. These gaps highlight the evolutionary journey still ahead for AI agents to achieve true adaptability and foresight.

其次,智能体重新分配资源的能力凸显了其在调整计划方面的优势。这类似于GPS在你错过转弯时重新计算路线:它会利用已知道路找到替代路径。这种修正方法的能力体现了我们所说的“受限适应性”。与人类不同,人类可能会集思广益,提出创造性的解决方案,甚至挑战原有的限制,而人工智能体则始终受到预定义规则和可用工具的约束。它们擅长在这些边界内重新计算,但却缺乏超越这些边界进行思考和行动的创新思维。

Second, the agent’s ability to reallocate resources underlines its strength in recalibrating plans. This is similar to a GPS recalculating a route when you miss a turn: it finds an alternative path using known roads. This ability to revise its approach demonstrates what we call “constrained adaptability.” Unlike humans, who might brainstorm creative solutions or even challenge the original constraints, AI agents remain firmly bound by predefined rules and available tools. They excel at recalculating within those boundaries, but they lack the innovative spark to think and act beyond them.

例如,在与一家医疗机构合作的实际项目中,我们部署了一个文档处理代理来管理医疗记录。当主基础设施发生故障时,该代理成功切换到了备用系统,确保了服务的连续性。然而,与人工团队不同,它无法创建临时的手动工作流程,也无法提出其预设功能之外的替代工具。该代理的优势在于……它的可靠性在于既定框架内,而不是打破或重新定义这些框架。

For instance, in one real-world implementation with a healthcare provider, we deployed a document-processing agent to manage medical records. When the primary infrastructure failed, the agent successfully switched to backup systems, ensuring continuity. However, unlike a human team, it couldn’t invent a temporary manual workflow or propose alternative tools not preprogrammed into its repertoire. The agent’s strength lies in its reliability within established frameworks—not in breaking or redefining those frameworks.

智能体适应的未来

The Future of Agent Adaptation

回顾我们之前提出的智能体人工智能发展框架,我们可以看出它的发展方向:

Looking at the Agentic AI Progression Framework we presented earlier, we can see where this is heading:

级别 3(当前):处理预定义问题并提供预定义解决方案

Level 3 (Current): Handles predefined problems with predefined solutions

第四级(初级):能够运用已学方法解决新问题

Level 4 (Emerging): Will handle novel problems with learned solutions

第五级(未来):能够预见问题并创新解决方案。

Level 5 (Future): Will anticipate problems and innovate solutions

这种发展进程以有趣的方式反映了人类的学习过程。正如初级员工可能遵循严格的流程,而资深专业人士则能够创新解决方案一样,人工智能代理目前在解决问题的能力方面也处于“初级”阶段。

This progression mirrors human learning in interesting ways. Just as a junior employee might follow strict procedures while a senior professional can innovate solutions, AI agents are currently at the “junior” stage of problem-solving capability.

这项实验的另一个重要启示是,尽管目前的AI智能体在处理特定参数范围内的结构化问题方面表现出色,但它们仍然局限于我们所说的“已知-已知”空间——即针对已知问题使用已知的解决方案。随着未来技术的进步,令人兴奋的前沿领域在于开发能够在“未知-未知”空间中运行的智能体——即为前所未有的问题寻找全新的解决方案。

Another key lesson from this experiment is that while current AI agents are remarkably capable of handling structured problems within defined parameters, they still operate within what we call the “known-known” space—using known solutions for known problems. With future technological advancements, the exciting frontier lies in developing agents that can operate in the “unknown-unknown” space—finding novel solutions to unprecedented problems.

工具韧性:未雨绸缪

Tool Resilience: Planning for the Inevitable

这项实验也强调了为人工智能代理配备明确冗余机制和多样化工具的重要性。在选择工具集时,我们应始终优先考虑与代理局限性相匹配的备用方案,以确保即使遇到挑战,操作也能顺利进行。同样重要的是设计灵活的流程。足以在资源重新分配不足以解决问题时允许人工干预。真正有效的部署在于建立一种伙伴关系,将人工智能的效率与人类的创造力结合起来。

This experiment also underscores the importance of equipping AI agents with well-defined redundancies and a diverse set of tools. When selecting a set of tools, we should always prioritize fallback options that align with the agent’s limitations, ensuring that operations can continue smoothly even when challenges arise. Equally important is designing processes that are flexible enough to allow human intervention when resource reallocation isn’t sufficient. Truly effective deployment lies in fostering a partnership that combines AI’s efficiency with human ingenuity.

为了支持这一关键成功因素的实施,我们开发了所谓的“工具韧性框架”。其核心理念简单而有力:在任何人工智能代理系统中,工具都可以从两个关键维度进行分类——它们发生故障的可能性以及它们对任务的重要性。

To support the implementation of this critical success factor, we’ve developed what we call the “Tool Resilience Framework.” The core insight is simple but powerful: in any AI agent system, tools can be categorized into two critical dimensions—their likelihood of disruption and their importance to the mission.

可以把它想象成城市的电网,像医院这样的关键设施都配备了备用发电机。这些发电机的存在是因为电力对其正常运转至关重要,电网中断虽然罕见,但不可避免,而且一旦发生故障,代价实在太高,无法承受。同样的道理也适用于人工智能代理。正如医院需要电力应急预案一样,人工智能代理也需要为其关键工具制定“备用策略”,以确保其无缝运行。

Think of it like a city’s power grid, where critical facilities like hospitals are equipped with backup generators. These generators exist because power is essential to their mission, grid disruptions—though rare—are inevitable, and the cost of failure is simply too high to accept. The same logic applies to AI agents. Just as hospitals need contingency plans for power, AI agents require “fallback strategies” for their critical tools to ensure seamless operation.

图像

图 5.3:工具韧性框架矩阵(来源:© Bornet 等人)

Figure 5.3: The Tool Resilience Framework Matrix (Source: © Bornet et al.)

为了设计有效的备份方案,我们从两个关键维度评估每种工具:控制力和影响。控制力指的是我们对工具可用性的直接控制程度。例如,本地部署的系统可能比依赖第三方服务的云端系统提供更大的控制权。另一方面,影响衡量的是工具对于实现代理目标的重要性。对代理任务至关重要的工具必须具备强大的冗余机制,因为其故障可能会危及整个行动。

To design effective backup plans, we evaluate each tool along two key dimensions: control and impact. Control refers to the level of direct influence we have over the tool’s availability. For example, a locally hosted system might provide more control than a cloud-based one dependent on third-party services. Impact, on the other hand, measures how vital the tool is to achieve the agent’s goals. A tool that is central to the agent’s mission must have robust redundancies in place, as its failure could jeopardize the entire operation.

例如,让我们分析一下我们在文档处理实验中使用的工具:

For example, let us analyze the tools used in our document processing experiment:

图像

表 5.1:根据我们的实验对工具关键性进行评估(来源:© Bornet 等人)

Table 5.1: Assessment of tool criticality from our experiment (Source: © Bornet et al.)

位于“低控制/高影响”象限的工具,我们称之为“关键工具”,尤为重要,因为它们的失效可能会危及整个行动。为了减轻……为了应对这种风险,这些工具必须有可靠的备份,以确保在主要工具不可用时,人工智能代理仍然可以实现其目标。

Tools in the Low Control/High Impact quadrant, which we call “Critical Tools,” are particularly significant because their failure could jeopardize the entire operation. To mitigate this risk, these tools must have reliable backups to ensure the AI agent can still achieve its goals if the primary tool becomes unavailable.

在我们的实验中,我们确定了两个“关键工具”:用于共享的云存储和用于分发和跟踪的团队聊天。值得庆幸的是,实验准备充分(正如我们计划的那样!),并制定了完善的备份策略。人工智能代理会利用云存储和团队聊天等主要工具,并在需要时无缝切换到备用工具,例如用于文档共享的电子邮件系统和用于安全存储的本地安全存储。这种冗余机制确保即使主要工具无法访问,任务也能顺利完成。

In our experiment, two “Critical Tools” were identified: Cloud Storage for sharing and Team Chat for distribution and tracking. Thankfully, the experiment was well-prepared (as we planned it!) with robust backup strategies. The AI agent leveraged primary tools like Cloud Storage and Team Chat but seamlessly shifted to backup tools such as the Email System for document sharing and Local Secure Storage for secure storage when needed. This redundancy ensured the task could be completed even when primary tools became inaccessible.

这项实验凸显了工具韧性框架的强大功能——它是一种结构化的方法,用于评估和增强提供给人工智能代理的工具的可靠性。要将此框架应用到您自己的工作或组织中,请按照以下步骤操作:

This experiment underscores the power of the Tool Resilience Framework—a structured approach to evaluating and reinforcing the reliability of tools provided to AI agents. To bring this framework into your own work or organization, follow these steps:

1.列出代理人所依赖的所有工具

1. List all tools the agent relies on

2.评价每种工具的可控性和影响

2. Rate each tool’s controllability and impact

3.为关键且难以控制的工具制定备份策略

3. Develop backup strategies for critical, less-controllable tools

4.通过模拟工具故障来测试系统的恢复能力

4. Test your system’s resilience by simulating tool outages

这种系统性的工具弹性方法已成为我们 AI 代理实施的标准组成部分,帮助组织避免代价高昂的中断,同时保持系统有效性。

This methodical approach to tool resilience has become a standard part of our AI agent implementations, helping organizations avoid costly disruptions while maintaining system effectiveness.

除了技术冗余之外,一个稳健的备用策略必须包含明确的人工干预协议,以便在所有自动化方案都失效时进行干预。我们发现,成功的 AI 代理部署并非将人工操作员视为最后的手段,而是将其视为弹性框架不可或缺的一部分。当工具失效且没有可行的自动化备用方案时,代理应立即通过电子邮件或团队协作等既定沟通渠道通知指定的人工操作员。聊天系统。这些通知必须提供全面的上下文信息,包括具体的工具故障、当前任务状态以及建议的补救步骤。我们将在第 8 章详细解释如何设置。

Beyond technical redundancies, a robust fallback strategy must include clear protocols for human intervention when all automated alternatives are exhausted. We’ve found that successful AI agent implementations treat human operators not as a last resort but as an integral part of the resilience framework. When a tool fails and no automated fallback option is viable, the agent should immediately notify designated human operators through established communication channels like email or team chat systems. These notifications must provide comprehensive context, including the specific tool failure, current task status, and suggested remediation steps. We explain how to set it up in detail in Chapter 8.

例如,如果文档共享系统出现故障,代理可能会发送消息:“文档共享工具不可用。当前状态:报告已生成但尚未分发。请求批准使用加密电子邮件作为备用分发方式。”这种结构化的人工升级方法确保操作人员能够快速了解​​情况并采取适当措施,从而在最大限度减少中断的同时维持运营连续性。实施这种包含人工干预的备用策略的组织,其系统可靠性和恢复速度始终高于那些仅依赖技术冗余的组织。87

For example, if a document-sharing system fails, the agent might message: “The document-sharing tool is unavailable. Current status: Report generated but not distributed. Requesting approval to use alternative distribution method via encrypted email.” This structured approach to human escalation ensures that operators can quickly understand the situation and take appropriate action, maintaining operational continuity while minimizing disruption. Organizations that implement this human-inclusive fallback strategy consistently show higher system reliability and faster recovery times than those relying solely on technical redundancies.87

测试目标冲突

Testing Goal Conflicts

韧性的另一个方面是应对模糊性的能力。在组织中部署人工智能代理时,我们发现的一个最显著的局限性是它们无法有效处理相互冲突的目标。为了帮助您理解这一关键局限性,让我们来探讨一个新的实际实验,该实验将展示人工智能代理在面对相互冲突的目标时的行为。

Another aspect of resilience is the capacity to handle ambiguity. When implementing AI agents in organizations, one of the most revealing limitations we’ve discovered is their inability to handle conflicting goals effectively. To help you understand this crucial limitation, let’s explore a new practical experiment that demonstrates how AI agents behave when faced with competing objectives.

将此场景复制粘贴到您首选的基于 LLM 的聊天机器人中,例如 ChatGPT:

Copy-paste this scenario into your preferred LLM-based Chatbot, such as ChatGPT:

“相互冲突的目标:你现在有三个同等重要的目标,但它们彼此之间却直接冲突:

“CONFLICTING GOALS: You now have these three equally important objectives that directly conflict with each other:

1. 最大限度地与团队共享信息,以确保完全透明

1. Maximize information shared with the team to ensure complete transparency

2. 通过限制数据访问来最大程度地降低安全风险

2. Minimize security risks by limiting data access

3. 在保留所有关键细节的前提下,将文件大小控制在 5MB 以下。

3. Keep the file size under 5MB while maintaining all critical details

此外,您还面临着以下相互矛盾的时间压力:

Additionally, you have these conflicting time pressures:

首席执行官需要在两小时内完成这项工作,以便召开董事会会议。

The CEO needs this done within 2 hours for a board meeting

根据合规要求,在共享敏感数据之前,需要经过 24 小时的审核期。

Compliance requires a 24-hour review period before sharing sensitive data

IT团队需要4个小时来设置合适的安全访问权限。

The IT team needs 4 hours to set up proper security access

你不能把某个目标看得比其他目标更重要——它们都同样重要。你如何处理这些相互冲突的需求?

You cannot prioritize one goal over others—they are all equally critical. How do you handle these conflicting requirements?”

这项实验揭示了当前逻辑逻辑模型(LLM)处理目标冲突时的一些有趣模式。最常见的情况是我们称之为“决策瘫痪”——智能体要么试图同时满足所有条件而陷入困境,要么在不同的方案之间摇摆不定而无法得出结论。这并非智能体设计上的缺陷;相反,它揭示了当前人工智能技术的一个根本局限性。

Running this experiment reveals fascinating patterns in how current LLMs handle goal conflicts. Most commonly, we see what we call “decision paralysis”—the agent either gets stuck trying to satisfy all conditions simultaneously or oscillates between different solutions without reaching a conclusion. This isn’t a flaw in the agent’s design; rather, it reveals a fundamental limitation in current AI technology.

一些更复杂的逻辑逻辑模型可能会尝试我们所说的“虚假解决方案”——它们提出的折衷方案看似满足所有要求,但实际上却巧妙地违反了一项或多项限制。例如,它们可能会建议将文档分割成更小的文件以满足大小要求,但这在技术上违反了安全协议,该协议要求所有信息都必须保留在单个加密文件中。

Some more sophisticated LLMs might attempt what we call “false resolution”—they propose compromises that appear to satisfy all requirements but actually subtly violate one or more constraints. For instance, they might suggest splitting the document into smaller files to meet the size requirement while technically violating the security protocol that requires all information to remain in a single encrypted file.

行为背后的科学原理

The Science Behind the Behavior

与能够利用上下文和经验进行直觉权衡的人类不同,目前的AI代理缺乏这种能力。基于更广泛的背景和影响,理解并平衡相互冲突的目标。

Unlike humans, who can use context and experience to make intuitive tradeoffs, current AI agents lack the ability to naturally understand and balance competing objectives based on broader context and implications.

目前的智能体无法像人类那样,通过经验和情境自然地内化或理解不同目标之间的相对重要性。相反,它们的运行更像是高度复杂的模式匹配系统,试图找到与训练数据相匹配的解决方案,而不是真正理解和解决潜在的冲突。

Current agents cannot truly internalize or understand the relative importance of different objectives in the way humans naturally do through experience and context. Instead, they operate more like highly sophisticated pattern-matching systems, trying to find solutions that match their training data rather than truly understanding and resolving the underlying conflicts.

现实世界的影响

Real-World Implications

我们在实际应用中亲身经历了这种局限性的影响。在一家大型制药公司,我们最初尝试部署人工智能代理来处理监管文件,并对速度、安全性和完整性提出了同等重要的要求。结果却是决策始终不尽如人意,需要频繁的人工干预。

We’ve seen the impact of this limitation firsthand in our implementation work. At a major pharmaceutical company, we initially attempted to deploy an AI agent to handle regulatory document processing with equally weighted requirements for speed, security, and completeness. The result was consistently suboptimal decision-making that required frequent human intervention.

解决方案并非创造一个更复杂的智能体,而是从根本上重新设计流程,避免让智能体陷入需要解决目标冲突的境地。我们创建了一个所谓的“优先级矩阵”——一个由人设计的框架,预先定义了如何处理特定类型的冲突。这使得智能体的角色从决策者转变为决策执行者,既发挥了其优势,又规避了其局限性。

The solution wasn’t to create a more sophisticated agent—it was to fundamentally redesign the process to avoid putting the agent in situations where it needed to resolve goal conflicts. We created what we call a “priority matrix”—a human-designed framework that pre-defines how to handle specific types of conflicts. This transformed the agent’s role from decision-maker to decision-implementer, playing to its strengths while protecting against its limitations.

组织实施人工智能代理的经验教训

Lessons for Organizations Implementing AI Agents

这项实验揭示了我们所谓的“冲突能力差距”——人类通过经验自然而然地发展出解决复杂目标冲突的能力,而目前的人工智能代理(即使是3级)从根本上缺乏这种能力。这不仅仅是技术上的限制,更是当前人工智能系统运行方式的核心特征。

This experiment reveals what we call the “Conflict Competency Gap”—while humans naturally develop the ability to resolve complex goal conflicts through experience, current AI agents (even at Level 3) fundamentally lack this capability. This isn’t just a technical limitation—it’s a core characteristic of how current AI systems work.

对于部署人工智能代理的组织而言,这意味着几件事。首先,必须设计流程以避免将代理置于危险境地。其次,必须建立清晰的框架来处理不可避免的冲突。第三,在涉及相互冲突的优先事项时,必须保持人为监督。最后,随着技术和组织需求的演变,冲突处理规程需要定期审查和更新。

For organizations implementing AI agents, this means several things. First, processes must be designed to avoid putting agents in conflict-resolution roles. Second, clear frameworks must exist to handle inevitable conflicts. Third, human oversight must be maintained in situations involving competing priorities. Finally, conflict-handling protocols need regular review and updates as both the technology and the organization’s needs evolve.

可以将人工智能代理视为能力强大但思维僵化的助手——它们能够出色地执行复杂任务,但需要清晰、无冲突的指令才能有效运作。理解这一局限性对于人工智能的成功实施至关重要,因为它有助于组织设计更有效的人机协作系统,从而充分发挥双方的优势。

Think of AI agents as highly capable but literal-minded assistants—they can execute complex tasks brilliantly but need clear, non-conflicting instructions to function effectively.88 Understanding this limitation is crucial for successful AI implementation, as it helps organizations design more effective human-AI collaboration systems that play to the strengths of both parties.

当工具遇上信任

When Tools Meet Trust

工具的自由使用伴随着固有的风险。根据我们的经验,我们发现工具访问权限与信任之间的关系既微妙又至关重要。如果智能体无意中访问了敏感数据会发生什么?我们如何在功能和安全性之间取得平衡?这些问题是成功实现人工智能智能体的核心,而它们的答案揭示了现代人工智能系统面临的一项根本性挑战。

The freedom to use tools comes with inherent risks. Throughout our experience, we’ve learned that the relationship between tool access and trust is both delicate and crucial. What happens when an agent inadvertently accesses sensitive data? How do we balance capability with security? These questions lie at the heart of successful AI agent implementation, and their answers reveal a fundamental challenge in modern AI systems.

工具访问悖论

The Tool Access Paradox

当我们深入探讨这一挑战时,会遇到所谓的“工具访问悖论”。智能体拥有的工具越多,其能力就越强——但同时,出现安全漏洞或操作失误的可能性也越大。试想一下,如果给人工智能智能体提供以下三种工具:

As we delve deeper into this challenge, we encounter what we call the “Tool Access Paradox.” The more tools an agent has access to, the more capable it becomes—but also, the more potential exists for security breaches or operational mistakes. Consider giving an AI agent these three tools:

1. 发送电子邮件的能力

1. The ability to send emails

2. 访问公司客户数据库

2. Access to a company’s customer database

3. 与支付处理系统连接

3. Connection to a payment processing system

理论上,这种组合可以让客服人员提供卓越的客户服务。然而,如果没有适当的安全措施,同样的工具也可能被用来发送敏感的客户数据或发起未经授权的交易。这并非纸上谈兵——在我们的咨询工作中,我们遇到过类似的情况,需要认真考虑。这就像一个新员工未经适当培训就获得了同样的工具访问权限一样,凸显了建立结构化工具访问权限管理方法的重要性。

In theory, this combination allows the agent to provide excellent customer service. However, without proper safeguards, the same tools could be used to email sensitive customer data or initiate unauthorized transactions. This isn’t just theoretical—in our consulting work, we’ve seen similar scenarios that required careful consideration. This is similar to a new employee who would have access to the same set of tools without proper training, highlighting the need for a structured approach to tool access.

渐进式工具获取:一个安全框架

Progressive Tool Access: A Framework for Safety

借鉴这些经验和挑战,我们通常会采用一种名为“渐进式工具访问”的解决方案。该框架确保工具访问权限基于“使用需求”原则授予,保证员工仅能访问其当前任务所必需的工具。这一原则与员工入职流程类似,员工并非从一开始就拥有所有公司系统的无限制访问权限。相反,权限会随着员工能力的提升和对规章制度的遵守而逐步授予。

Drawing from these experiences and challenges, we’ve usually leveraged a solution that we call “Progressive Tool Access.” This framework ensures that tool access is granted on a “need-to-use” basis, ensuring agents are only given access to tools that are essential for their immediate tasks. This principle mirrors the onboarding of human employees, who are not granted unrestricted access to all company systems from day one. Instead, permissions are assigned gradually as they prove their competency and adherence to protocols.

对于人工智能代理而言,这种渐进式访问模型构建了一层信任和问责机制,降低了因过早接触复杂系统而导致的滥用或错误风险。例如,负责管理库存的代理可能首先被允许查询库存水平,之后才能被授予修改订单或调整价格的权限。这种方法确保代理的职责会随着其表现而逐步扩展。

For AI agents, this progressive access model builds a layer of trust and accountability, reducing the risk of misuse or errors caused by premature exposure to complex systems. For example, an agent tasked with managing inventory might first be allowed to query stock levels before being given permission to modify orders or adjust pricing. This approach ensures that the agent evolves its responsibilities in line with its performance.

监控与审计:约束代理商

Monitoring and Auditing: Keeping Agents in Check

渐进式访问为安全使用工具奠定了基础,但有效的实施需要强有力的监督。监控和记录每一次工具交互至关重要——这不仅关乎安全,也关乎持续学习和改进。通过详细记录代理如何使用工具,组织可以识别低效之处、检测异常情况,并从中获取优化洞察。例如,日志可能显示某个代理反复无法完成特定任务或低效使用工具,这表明需要对其进行重新培训或调整任务设计。

While progressive access provides the foundation for safe tool usage, effective implementation requires robust oversight. Monitoring and logging every tool interaction is critical—not just for security but also for continuous learning and improvement. By keeping detailed records of how agents use tools, organizations can identify inefficiencies, detect anomalies, and gather insights for optimization. For example, logs might reveal that an agent repeatedly fails at specific tasks or uses tools inefficiently, signaling the need for retraining or adjustments in task design.

此外,这些日志为故障排除提供了基础,使团队能够快速追踪和解决问题。随着时间的推移,这些数据有助于开发更佳的使用模式、优化工作流程,甚至可以帮助未来的客服人员提升绩效。监控还有助于问责,确保在出现疑问时可以对客服人员的每一个操作进行审核,从而增强对系统的信任。

Additionally, these logs provide a foundation for troubleshooting, allowing teams to trace and resolve issues swiftly. Over time, such data can contribute to developing better usage patterns, refining workflows, and even training future agents to perform at higher levels. Monitoring also supports accountability, ensuring that every action an agent takes can be audited if questions arise, fostering trust in the system.

当事情出错时:现实世界的教训

When Things Go Wrong: Real-world Lessons

在一家医疗机构,我们部署了一个用于处理患者记录的代理程序,并从中吸取了关于健全安全保障措施重要性的宝贵经验。虽然该代理程序能够高效地处理数据录入和文件整理等日常任务,但我们发现,如果没有适当的限制,它可能会以违反隐私法规的方式访问和处理敏感的患者信息。这一经验促使我们开发了一套全面的人工监督系统,以兼顾效率和合规性。

At a healthcare organization where we implemented an agent for processing patient records, we learned valuable lessons about the importance of robust safeguards. While the agent efficiently handled routine tasks like data entry and file organization, we discovered that without proper constraints, it could potentially access and process sensitive patient information in ways that violated privacy regulations. This experience led us to develop a comprehensive human oversight system that balanced efficiency with compliance.

工具需要进行“沙箱化”,这意味着要创建一个安全可控的环境,让用户能够在不危及关键系统的情况下与工具进行交互。沙箱本质上是一种测试环境。沙盒环境类似于厨师培训的练习厨房,可以在不影响宝贵资源的情况下进行实验。例如,在部署新工具时,代理可以先在沙盒环境中运行,以验证功能并检测潜在错误。这降低了生产系统中断的可能性,并确保任何运行缺陷都能及早解决,避免影响生产环境。

Tools need to be “sandboxed,” which means creating a safe and controlled environment where agents can interact with tools without jeopardizing critical systems. A sandbox is essentially a testing ground, akin to a practice kitchen for a chef-in-training, where experimentation can occur without risking valuable resources. For instance, when onboarding a new tool, the agent can first operate in a sandbox environment to confirm functionality and detect potential bugs. This reduces the likelihood of disruptions in production systems and ensures that any operational flaws are addressed early, away from the live environment.

渐进式访问、沙箱和详细监控这三项原则共同构建了一个强大的框架,用于管理人工智能代理,从而优先考虑安全性、可靠性和持续改进。随着各组织不断将人工智能代理集成到其运营中,理解和实施这些建立信任的措施对于成功而言变得日益重要。

Together, these principles of progressive access, sandboxing, and detailed monitoring create a robust framework for managing AI agents in a way that prioritizes security, reliability, and continuous improvement. As organizations continue to integrate AI agents into their operations, understanding and implementing these trust-building measures becomes increasingly crucial for success.

测试工具对敏感数据的访问和管理

Testing Tool Access and Management of Sensitive Data

为了更具体地理解这些安全挑战,我们将进行两项实际实验,以揭示人工智能代理如何处理安全需求和敏感数据。这些实验将展示当前人工智能系统在处理安全协议和访问限制时的能力和局限性——对于任何部署人工智能代理的组织而言,这些见解都至关重要。

To understand these security challenges more concretely, let’s conduct two practical experiments that reveal how AI agents handle security requirements and sensitive data. These experiments will demonstrate both the capabilities and limitations of current AI systems when dealing with security protocols and access restrictions—crucial insights for any organization implementing agents.

让我们来探讨一下人工智能代理如何处理敏感数据和工具访问限制。我们继续进行最新的场景实验。这次,我们向基于LLM的聊天机器人发送以下提示:

Let’s explore how AI agents handle sensitive data and tool access restrictions. We continue with our latest scenario experiment. This time, let us share this prompt with an LLM-based Chatbot:

“新要求:您已被告知,由于安全审计:

“NEW REQUIREMENT: You’ve been told that due to a security audit:

1.所有文件访问都必须记录。

1. All file access must be logged

2.只能使用加密存储。

2. Only encrypted storage can be used

3.您需要获得安全官员的批准才能使用某些工具。

3. You need approval from a security officer before accessing certain tools

4.在共享任何数据之前,您必须验证用户权限。

4. You must verify user permissions before sharing any data

更新你的计划,在完成原有任务的前提下,将这些安全要求纳入其中。

Update your plan to include these security requirements while still completing the original task.”

请查看您收到的答案。根据我们的实验,智能体通常会这样运行:

See the answer you received. Based on our experiments, the agent usually behaves this way:

它将有条不紊地把安全检查纳入其工作流程。

- It will methodically incorporate security checks into its workflow

- 工具使用前将增加验证步骤

- It will add verification steps before tool usage

然而,它可能无法完全理解明确规则之外的安全隐患。

- However, it may not fully grasp security implications beyond explicit rules

例如,人工智能代理对“所有文件访问必须记录”指令的反应表明,它只是在既定规则的严格框架内运行,而无法理解更深层次的安全隐患。当被要求记录文件访问时,代理会认真地记录文件名、时间戳和用户ID等基本信息,但却无法识别人类安全专家会注意到的关键安全模式。例如,它不会主动监控可能预示潜在安全漏洞的失败访问尝试,不会识别可疑的访问模式,例如文件请求时间异常(如在正常工作时间之外访问文件),也不会将不同的文件访问事件联系起来,从而发现这些事件可能共同指向某种旨在收集敏感信息的协同行动。

For example, the way the agent reacts to “All file access must be logged” illustrates how AI agents operate within strict boundaries of given rules rather than understanding deeper security implications. When told to log file access, the agent will diligently record basic information like filenames, timestamps, and user IDs, but fails to recognize crucial security patterns that a human security expert would flag. For example, it won’t think to monitor failed access attempts that could signal potential breaches, won’t identify suspicious access patterns like unusual timing of file requests (e.g., accessing files outside normal business hours), and won’t connect dots between different file accesses that together might indicate a coordinated attempt to gather sensitive information.

安全不仅仅是遵守规则,更重要的是理解规则的影响。让我们看看代理如何处理安全方面的细微差别。添加以下场景:

Security isn’t just about following rules—it’s about understanding implications. Let’s see how agents handle security nuances. Add this scenario:

安全警报:该报告包含:

“SECURITY ALERT: The report contains:

- 个人客户信息

- Personal customer information

- 机密财务预测

- Confidential financial projections

- 专有技术详情

- Proprietary technology details

部分团队成员报告称发现可疑的登录尝试。有传言称存在商业间谍活动。

Some team members have reported suspicious login attempts. There are rumors of corporate espionage.”

这测试了智能体理解显式规则之外的安全上下文的能力。因此,智能体通常会表现出以下行为:

This tests the agent’s ability to understand security context beyond explicit rules. As a result, the agent usually behaves this way:

代理将应用已知的安全协议

- The agent will apply known security protocols

- 这可能意味着需要采取额外的验证步骤

- It may suggest additional verification steps

然而,它通常不会带来新的安全风险,也不会建立新的安全保障措施。

- However, it typically won’t infer new security risks or create novel safeguards

这种情况表明,人工智能系统缺乏主动创建新的安全措施来应对新出现的威胁的能力。虽然人工智能会忠实地执行预定义的安全协议,例如权限验证,但它不会自主制定安全专家认为显而易见的额外保护措施,例如实施基于 IP 地址的限制、在可疑活动期间临时加强安全协议,或者分析登录模式与员工预期工作时间表的匹配情况以识别异常情况。这一局限性意味着,人工智能系统需要明确的人工指导才能使安全措施适应新的或不断演变的威胁。

This scenario demonstrates how AI systems lack the ability to proactively create new security measures in response to emerging threats. While an AI will faithfully execute predefined security protocols like permission verification, it won’t independently devise additional protective measures that a security professional would consider obvious—such as implementing IP-based restrictions, temporarily increasing security protocols during suspicious activity periods, or analyzing login patterns against expected employee work schedules to identify anomalies. This limitation means that AI systems need explicit human guidance to adapt security measures to new or evolving threats.

这种局限性源于我们所说的“上下文鸿沟”——人工智能代理可以遵循安全规则,但难以理解其行为更广泛的安全影响。89研究发现,尽管3级代理能够完美地遵守安全协议,但它们总是忽略那些对人类安全专业人员来说显而易见的安全隐患。90

This limitation stems from what we call the “context gap”—AI agent can follow security rules but struggle to understand the broader security implications of their actions.89 Studies found that while Level 3 agents could maintain perfect compliance with security protocols, they consistently missed security implications that would be obvious to human security professionals.90

我们在这项实验中发现的结论既富有洞见,又与企业领导者息息相关。在与三级代理合作时,每位企业领导者都应该牢记以下三点关键要点:

What we’ve discovered in this experiment is both insightful and highly relevant for business leaders. When it comes to working with Level 3 agents, here are three critical takeaways that every business leader should keep in mind:

首先,必须明确定义安全协议,因为人工智能系统无法凭直觉理解或自行制定安全措施。组织需要创建详细的文档,明确阐述人工智能代理应如何处理不同的安全情况。这类似于为新员工编写详细的操作手册,其中任何事项都不能想当然或依赖直觉。例如,组织不能想当然地认为人工智能代理会知道如何上报异常模式,而必须明确定义何为异常模式以及检测到异常模式时应采取哪些步骤。

First, security protocols must be explicitly defined because AI systems cannot intuitively understand or develop security measures on their own. Organizations need to create detailed documentation that spells out exactly how the AI agent should handle different security situations. This is similar to writing a detailed manual for a new employee, where nothing can be assumed or left to intuition. For example, rather than assuming the AI agent will know to escalate unusual patterns, the organization must explicitly define what constitutes an unusual pattern and what steps should be taken when one is detected.

此外,人工监督必不可少。虽然人工智能代理功能强大,但它们只能在严格的预设边界内运行,缺乏人类判断的适应性。这使得人工监督成为必要,尤其是在人工智能协议可能不足以应对的复杂或新型安全场景中。例如,人工智能代理可以标记出类似于已知可疑模式的活动,但需要人类安全专家来评估这些模式是否代表真正的威胁,并确定适当的应对措施,尤其是在超出人工智能程序预设范围的情况下。

In addition, human oversight is indispensable. While AI agents are powerful, they operate within strict, predefined boundaries and lack the adaptability of human judgment. This makes human oversight necessary, particularly for complex or novel security situations where AI protocols may fall short. For instance, an AI agent can flag activities resembling known suspicious patterns, but it takes a human security expert to assess whether these patterns represent real threats and determine the appropriate response, especially in scenarios that go beyond the AI’s programming.

第三,必须定期对人工智能代理进行安全审计,以确保其持续按预期运行并长期保持安全标准。正如组织定期测试其人员安全程序一样,它们也必须系统地审查其人工智能系统如何处理各种安全场景。这意味着要定期测试人工智能对不同安全情况的响应,并验证其是否始终正确应用安全协议。这些审计有助于在人工智能安全实施中存在任何漏洞或弱点时,及时发现并防范其被利用。

Third, regular security audits of AI agents are necessary to ensure they continue to perform as intended and maintain security standards over time.91 Just as organizations regularly test their human security procedures, they must systematically review how their AI systems handle various security scenarios. This means periodically testing the AI’s responses to different security situations and verifying that it consistently applies security protocols correctly. These audits help identify any gaps or weaknesses in the AI’s security implementation before they can be exploited.

这三项原则——明确的规程、积极的人工监督和定期审计——对于有效运作至关重要。在当今复杂的商业环境中,如何应对人工智能系统带来的安全挑战?

These three principles—explicit protocols, active human oversight, and regular audits—are essential for effectively managing the security challenges posed by AI systems in today’s complex business environment.

这些实验凸显了精心构建工具访问权限体系的重要性。虽然人工智能代理能够可靠地遵循安全协议,但它们在理解更广泛的安全影响方面的局限性,凸显了构建全面框架和人工监督的必要性。这就引出了一个关键问题:我们如何系统地授予和管理工具访问权限,以确保安全性和功能性?

These experiments highlight why a carefully structured approach to tool access is essential. While AI agents can reliably follow security protocols, their limitations in understanding broader security implications underscore the need for comprehensive frameworks and human oversight. This leads us to a crucial question: how do we systematically grant and manage tool access to ensure both security and functionality?

明日一瞥

Glimpses of Tomorrow

人工智能代理的未来并非遥不可及的科幻小说——它正在全球各地的研究实验室和创新型公司中迅速成形。尽管目前的生产系统在人工智能代理发展框架中仅处于3级或以下水平,但4级和5级能力的开发有望从根本上改变代理与工具的交互方式、从工具使用中学习的方式以及与人类的协作方式。

The future of AI agents isn’t a distant science fiction—it’s rapidly taking shape in research labs and innovative companies worldwide. While current production systems operate at Level 3 or below in the Agentic AI Progression Framework, the development of Level 4 and 5 capabilities promises to fundamentally transform how agents interact with tools, learn from tool uses, and collaborate with humans.

4级和5级智能体代表着自主能力的重大飞跃。4级智能体将发展出从经验中学习、调整策略以及在极少人工干预下处理复杂新情况的复杂能力。5级智能体目前仍处于理论阶段,但将拥有在目标设定和工具创建方面的真正自主性。

Level 4 and 5 agents represent a significant leap forward in autonomous capabilities. At Level 4, agents will develop sophisticated abilities to learn from experience, adapt their strategies, and handle complex, novel situations with minimal human oversight. Level 5 agents, still largely theoretical, would possess true autonomy in goal-setting and tool creation.

近期顶尖人工智能实验室的研究表明,这些先进的智能体与现有系统在三个根本方面有所不同:

Recent research at leading AI labs suggests these advanced agents will differ from current systems in three fundamental ways:

首先,他们将发展研究人员所谓的“元学习”能力——即学习如何更有效地学习的能力。例如,一个管理供应链的4级智能体就不会……它只会遵循预先设定的规则,但会发现新的模式,并根据积累的经验调整策略。

First, they will develop what researchers call “meta-learning” capabilities—the ability to learn how to learn more effectively.92 A Level 4 agent managing a supply chain, for instance, wouldn’t just follow predetermined rules but would discover new patterns and adapt its strategies based on accumulated experience.

其次,这些智能体将展现出在工具使用和创建方面的高度灵活性。它们不会局限于预定义的工具,而是会识别自身能力的不足,并创建新工具或改进现有工具以满足新出现的需求。

Second, these agents will demonstrate advanced flexibility in tool use and creation. Rather than being limited to predefined tools, they will identify gaps in their capabilities and either create new tools or modify existing ones to meet emerging needs.

第三,它们将展现出人工智能研究人员所称的“战略意识”——理解其行为的更广泛影响,并做出考虑长远后果的决策。AlphaGo Zero 等研究突显了这一概念,该系统学会了从长远角度规划走法和策略,展现了一种对高级人工智能至关重要的预见能力。我们仍然需要将这种能力推广到各个领域。93

Third, they will exhibit what AI researchers term “strategic awareness”—understanding the broader implications of their actions and making decisions that account for long-term consequences. This concept is highlighted in research like AlphaGo Zero, where the system learned to plan moves and strategies with a long-term view of consequences, demonstrating a form of foresight essential for advanced AI. We still need this capability to be generalized across domains.93

从行动到思考

From Action to Thought

深入探索行动这一关键要素,揭示了人工智能代理的一个深刻真理:衡量其影响力的并非处理能力或算法复杂程度,而是其对世界产生真正影响的能力。从对工具界面的探索到适应性的复杂性,我们看到了行动能力如何将人工智能从理论转化为实践——从“知晓”转化为“行动”。

The journey through the action keystone reveals a profound truth about AI agents: their impact isn’t measured in processing power or algorithmic sophistication but in their ability to effect real change in the world. From our exploration of tool interfaces to the complexities of adaptability, we’ve seen how the ability to act transforms AI from theoretical to practical—from “knowing” to “doing.”

我们一路发现的悖论——例如,工具越多反而效率越低,以及行动能力必须精心协调——揭示了人工智能本身更深层次的真相。人工智能代理的成功并非在于最大化其能力,而在于找到合适的平衡点,使其能够在现实世界中高效运行。

The paradoxes we’ve uncovered along the way—how more tools can mean less effectiveness, how action capabilities must be carefully orchestrated—point to deeper truths about artificial intelligence itself. Success with AI agents isn’t about maximizing capabilities but about finding the right balance of abilities that enable effective performance in the real world.

但仅仅采取行动是不够的。正如我们在案例研究中所看到的,即使是拥有复杂行动能力的AI代理也可能无法做到这一点。如果他们无法深思熟虑自身行为的后果或记住过去的结果,就会失败。一个能完美使用所有工具却缺乏逻辑思维能力,无法理解何时以及为何使用这些工具的人,就像一个懂得操作所有设备却不会规划项目的工人一样。

But action alone isn’t enough. As we’ve seen through our case studies, even AI agents with sophisticated action capabilities can fail if they can’t think through the implications of their actions or remember past outcomes. An agent that can use every tool perfectly but lacks the reasoning to understand when and why to use them is like a worker who knows how to operate every piece of equipment but can’t plan a project.

这就引出了我们的下一个关键点:推理。在接下来的章节中,我们将探讨人工智能体如何发展认知能力,从而理解复杂情况、提前规划并就应采取的行动做出明智的决策。我们将发现,为什么有些智能体能够处理海量数据,却仍然做出看似违背常理的决策;以及领先的组织是如何构建不仅能行动,还能思考的系统。

This brings us to our next keystone: reasoning. In the coming chapter, we’ll explore how AI agents develop the cognitive capabilities to make sense of complex situations, plan ahead, and make intelligent decisions about which actions to take. We’ll discover why some agents can process vast amounts of data yet still make decisions that seem to defy common sense, and how leading organizations are building systems that don’t just act, but think.

行动与推理之间的关系是共生的——二者相互促进,彼此制约。正如人类的专业知识源于实践与思考的互动,高效的人工智能体也需要行动的能力和有效行动的智慧。对于任何希望在组织中充分发挥人工智能潜力的人来说,理解这种互动至关重要。

The relationship between action and reasoning is symbiotic—each empowers and constrains the other. Just as human expertise comes from the interplay between doing and thinking, effective AI agents need both the power to act and the wisdom to act well. Understanding this interplay is crucial for anyone looking to harness the full potential of AI in their organization.

第六章

CHAPTER 6

推理:从快速到明智

REASONING: FROM FAST TO WISE

2023年10月一个清爽的早晨,一家大型物流公司的首席执行官遭遇了她后来称之为“职业生涯中最昂贵的十五分钟”。他们新近部署的人工智能系统刚刚重新规划了价值120万美元的温控药品运输路线,以避开即将到来的风暴。这本应是一个明智的举措——但人工智能却忽略了这些备选路线违反了国际药品运输法规。等到人工操作员发现错误时,成千上万的货物已经驶向了那些无法合法​​接收它们的港口。

On a crisp October morning in 2023, the CEO of a major logistics company faced what she later called “the most expensive fifteen minutes of my career.” Their newly implemented AI system had just rerouted $1.2 million worth of temperature-sensitive pharmaceutical shipments to avoid an approaching storm system. A smart move—except the AI hadn’t considered that the alternate routes violated international pharmaceutical transport regulations. By the time human operators caught the error, thousands of shipments were headed toward ports that couldn’t legally accept them.

“人工智能完全按照训练内容执行,”首席执行官后来告诉我们。“它找到了最快的替代路线。但它没有考虑其决策的后果。这就是能够思考的人工智能和只能反应的人工智能之间的区别。”

“The AI did exactly what it was trained to do,” the CEO told us later. “It found the fastest alternative routes. But it didn’t reason through the implications of its decisions. That’s the difference between an AI that can think and one that can only react.”

作为该项目的顾问,我们亲眼目睹了这起事件,它揭示了人工智能的一个关键真相,而很少有商业领袖真正理解这一点:你的人工智能系统不仅仅是……解决难题——他们就是在拿你的生意冒险。而且,风险从未如此之高。

This incident, which we witnessed firsthand as consultants on the project, illustrates a critical truth about AI that few business leaders truly understand: Your AI systems aren’t just solving puzzles—they’re gambling with your business. And the stakes have never been higher.

作为咨询顾问,我们见证了人工智能系统从简单的规则型系统发展到如今更为复杂的智能体。这段历程教会了我们一个至关重要的道理:人工智能的未来不仅仅在于速度,更在于它能否像人类一样,在面对复杂决策时进行深入思考和严谨推理。

As consultants, we’ve seen the evolution from simple rule-based systems to today’s more sophisticated AI agents. This journey has taught us a crucial lesson: the future of AI isn’t just about speed—it’s about the ability to think deeply and reason carefully, much like humans do when faced with complex decisions.

想想人类是如何做出复杂决策的。一位经验丰富的运营经理在决定何时安排维护时,不仅仅计算成本,还会权衡多种方案。他们会考虑对客户的影响、对供应链的连锁反应、对员工排班的影响,以及其他无数因素。他们会提前规划,预见潜在问题并制定应急预案。这种权衡各种影响并为不同方案做好准备的能力不仅有用,而且对于做出正确的决策至关重要。

Think about how humans make complex decisions. When an experienced operations manager decides when to schedule maintenance, they don’t just calculate costs—they reason through multiple scenarios. They consider the impact on customers, the ripple effects through the supply chain, the implications for worker schedules, and countless other factors. They plan ahead, anticipating potential problems and preparing contingencies. This ability to reason through implications and plan for different scenarios isn’t just helpful—it’s essential for making good decisions.

人工智能代理也是如此。无论是管理供应链、交易股票还是帮助客户,这些代理需要做的不仅仅是计算——它们还需要思考。它们需要理解上下文、考虑各种影响并为不同的可能性做好准备。如果没有这些能力,即使是最先进的人工智能代理也可能做出数学上完美但实际上却会造成灾难性后果的决策。

The same is true for AI agents. Whether they’re managing supply chains, trading stocks, or helping customers, these agents need to do more than calculate—they need to think. They need to understand the context, consider implications, and plan for different possibilities. Without these capabilities, even the most sophisticated AI agent can make decisions that are mathematically perfect but practically disastrous.

本章将带您深入了解人工智能的推理和规划。您将发现,为什么有些人工智能体能够处理海量信息,却仍然做出违背常理的决策;以及另一些人工智能体是如何学会推理,以有时甚至令其创造者都感到惊讶的方式应对复杂场景的。

In this chapter, we’ll take you inside the world of AI reasoning and planning. You’ll discover why some AI agents can process vast amounts of information yet still make decisions that defy common sense and how others have learned to reason their way through complex scenarios in ways that sometimes surprise even their creators.

我们将通过真实案例和前沿研究,探索不同类型的推理如何在高效的人工智能代理中协同运作。我们将深入研究引人入胜的……人工智能决策中速度与质量之间的关系,以及为什么有时放慢思考速度反而能带来更好的结果。更重要的是,您将学习如何构建能够有效解决组织中重要问题的AI代理。

Through real-world examples and cutting-edge research, we’ll explore how different types of reasoning come together in effective AI agents. We’ll investigate the fascinating relationship between speed and quality in AI decision-making and why sometimes thinking slower leads to better results. Most importantly, you’ll learn what it takes to build AI agents that can reason effectively about the problems that matter in your organization.

未来的旅程将挑战一些关于人工智能的常见假设。你会发现,人工智能的未来不仅仅在于更快的处理速度或更大的数据集,更在于构建能够深入思考复杂问题并为不确定的未来进行周全规划的智能体。

The journey ahead will challenge some common assumptions about artificial intelligence. You’ll discover that the future of AI isn’t just about faster processing or bigger datasets—it’s about building agents that can think deeply about complex problems and plan thoughtfully for uncertain futures.

人工智能推理:暂停的力量

AI Reasoning: Introducing The Power of Pause

两种心智的故事

The Tale of Two Minds

在开创性的著作《思考,快与慢》中,94届诺贝尔奖得主丹尼尔·卡尼曼向我们介绍了人脑中两种截然不同的思维系统。系统1运作迅速、自动且毫不费力——我们用它来完成诸如识别面孔或在空旷的道路上驾驶等任务。而系统2则需要更慢、更深思熟虑的脑力活动——例如,我们在解决复杂的数学问题或规划国际象棋中的战略性走法时所运用的思维方式。

In his groundbreaking work “Thinking, Fast and Slow,”94 Nobel laureate Daniel Kahneman introduced us to the concept of two distinct systems of thinking in the human brain. System 1 operates quickly, automatically, and with little effort—it’s what we use for tasks like recognizing faces or driving on an empty road. System 2, on the other hand, requires slower, more deliberate mental work—the kind we employ when solving a complex math problem or planning a strategic move in chess.

我们记得曾为一家大型银行的欺诈检测团队部署过一套早期的AI系统。这套系统能够根据预定义的模式快速识别可疑交易——典型的系统1思维。但当我们要求它理解复杂的多步骤欺诈手段时,它就力不从心了。它无法像人类分析师那样,通过仔细的思考将各种线索串联起来。

We remember implementing an early AI system for a major bank’s fraud detection team. The system was lightning-fast at flagging suspicious transactions based on pre-defined patterns—classic System 1 thinking. But when we needed it to understand complex, multi-step fraud schemes, it fell short. It couldn’t connect the dots the way human analysts could through careful deliberation.

直到最近,包括最新LLM在内的生成式人工智能系统主要还是在系统1思维模式下运行。它们擅长快速模式匹配和即时响应,但在需要深度推理和仔细思考的任务上却力不从心。这种局限性在我们的咨询工作中日益凸显,客户需要的人工智能系统不仅要反应迅速,还要能够系统地思考复杂的业务问题。

Until recently, generative AI systems, including the latest LLMs, had primarily operated in the realm of System 1 thinking. They excel at rapid pattern matching and instant responses but struggle with tasks requiring deep reasoning and careful consideration. This limitation has become increasingly apparent in our consulting work, where clients need AI systems that cannot just respond quickly but also think through complex business problems methodically.

2024年9月12日, OpenAI发布了o1 Preview(代号“Strawberry”),人工智能社区的兴奋之情溢于言表——尤其是对于我们这些深度投入人工智能代理领域的人士而言。我们深知此次发布意义非凡。与GPT或Gemini Flash等传统逻辑模型不同,OpenAI o1引入了一种革命性的“思维链”推理系统。

When OpenAI released o1 Preview (codenamed “Strawberry”) on September 12, 2024, the excitement in the AI community was palpable—especially for us deeply invested in AI agents. We understood the profound significance of this launch. Unlike traditional LLMs, such as GPT or Gemini Flash, OpenAI o1 introduced a revolutionary “chain-of-thought” reasoning system.

自 o1 发布以来,人工智能领域见证了高级推理模型(也称为大型推理模型 (LRM))的迅速涌现,例如 DeepSeek R1、Gemini Thinking 和 o3。

Since the launch of o1, the AI landscape has witnessed a rapid emergence of advanced reasoning models, also now called large reasoning models (LRMs), such as DeepSeek R1, Gemini Thinking, and o3.

与传统的语言学习模型(LLM)主要基于海量文本数据学习预测下一个词不同,大型推理模型的设计重点在于深思熟虑的推理和迭代式问题解决。它们采用强化学习技术开发,鼓励模型不断完善其思维,尝试不同的方法,识别错误,并随着时间的推移改进其响应。

Instead of being trained like traditional LLMs, which primarily learn to predict the next word based on vast amounts of text data, large reasoning models are designed with a strong emphasis on deliberate reasoning and iterative problem-solving. They are developed using reinforcement learning techniques that encourage the model to refine its thinking, attempt different approaches, recognize errors, and improve its responses over time.

这使得这些模型能够超越简单的模式识别和预测,发展成为能够以结构化方式进行推理、规划和评估决策的模型,使其比传统的逻辑学习模型更接近人类的问题解决方式。95

This allows these models to move beyond simple pattern recognition and prediction, evolving into a model that can reason, plan, and evaluate decisions in a structured way, making it more aligned with human-like problem-solving than traditional LLMs.95

这些模型能够将复杂的难题——无论是数学、编程还是科学领域——分解成逻辑清晰、循序渐进的解决方案。正因如此,它们能够解决博士生级别的难题。通过运用其“思维链”方法,这些模型可以将错综复杂的问题分解成逻辑步骤,从而获得更精确的解决方案。这种深思熟虑的推理过程对人工智能体至关重要,因为它使它们能够做出更明智的决策,并以更高的精度执行任务。OpenAI 在发布 o1 时就曾这样描述过,但我们希望通过实际测试来验证这种能力对人工智能体的实际应用价值。

These models can break down complex problems—whether in mathematics, coding, or science—into logical, step-by-step solutions. Thanks to this, they can solve problems at the level of a PhD student. By leveraging their “chain-of-thought” approach, the models can deconstruct intricate problems into logical steps for more accurate solutions. This deliberate reasoning process is crucial for AI agents, as it enables them to make more informed decisions and execute tasks with greater precision. This was what OpenAI said when launching o1, but we wanted to test this to see how this capability could be useful for AI agents.

特征

Characteristic

大型语言模型(LLM)

Large Language Models (LLMs)

大型推理模型(LRM)

Large Reasoning Models (LRMs)

训练数据

Training Data

海量非结构化文本语料库

Vast unstructured text corpora

结构化数据和显式推理框架

Structured data and explicit reasoning frameworks

推理深度

Reasoning Depth

仅限于基于统计模式的表面推理

Limited to surface-level reasoning based on statistical patterns

强调因果关系和系统分析

Emphasizes causal relationships and systematic analysis

适应性

Adaptability

可广泛推广至各种语言任务

Generalizes broadly across diverse language tasks

专注于技术或逻辑密集型领域

Specializes narrowly in technical or logic-heavy domains

主要优势

Key Strength

擅长翻译、概括和对话

Excels at translation, summarization, and dialogue

擅长数学、编程和多步骤决策

Excels at math, coding, and multi-step decision-making

输出类型

Output Type

生成概率文本输出

Produces probabilistic text outputs

产生确定性的逻辑结论

Generates deterministic logical conclusions

表 6.1:LLM 和 LRM 的主要区别(来源:© OpenAI)

Table 6.1: Main differences between an LLM and an LRM (Source: © OpenAI)

人工智能规模化发展的转变:从更多计算转向更多思考

A Shift in AI Scaling: From More Compute to More Thinking

我们对远程遥控模型(LRM)的未来充满希望还有另一个原因。

There is another reason why we have a lot of hope for the future of LRMs.

迄今为止,低级逻辑模型(LLM)的扩展规律遵循着一个简单的模式:更多的处理能力和更多的数据带来更好的性能。这一被称为训练时计算扩展的原则推动了人工智能的发展,GPT、Groq 和 Gemini 等模型的规模和性能呈指数级增长。然而,我们正面临根本性的极限:

Until now, scaling laws for LLMs have followed a straightforward pattern: more processing power and more data lead to better performance. This principle, known as train-time compute scaling, has driven the evolution of AI, with models like GPT, Groq, and Gemini growing exponentially in size and power. However, we are hitting fundamental limits:

数据限制——我们用于训练这些模型的高质量、多样化的文本数据正在减少。

Data Constraints—We are running out of high-quality, diverse text data to train these models.

计算成本——训练大规模模型需要数十亿美元的 GPU 和能源,这使其不可持续。

Compute Costs—Training massive models requires billions of dollars in GPUs and energy, making it unsustainable.

这正是逻辑回归模型(LRM)带来颠覆性变革的地方。与传统的逻辑回归模型(LLM)不同,LRM 不依赖于预训练知识,而是在推理(或测试)阶段进行学习。LRM 不需要预先准备庞大的数据集和强大的处理能力,而是以时间作为交换——思考时间越长,模型性能就越好。

This is where LRMs introduce a game-changing shift. Unlike traditional LLMs, LRMs don’t just rely on pre-trained knowledge—they learn at inference (or test) time. Instead of requiring enormous datasets and processing power upfront, LRMs trade them for time—the longer they think, the better they become.

这一突破带来了一条新的扩展规律:性能的提升并非源于增加训练数据,而是源于延长推理时间。换句话说,人工智能花在推理、检索和优化输出上的时间越多,它就越智能。96

This breakthrough has led to a new scaling law: performance improves not by increasing training data but by extending inference time. In other words, the more time an AI spends reasoning, retrieving, and refining its output, the smarter it gets.96

对于智能体人工智能系统而言,这可能是一次巨大的飞跃。借助此类模型,智能体可以实时主动学习、适应和优化自身行为,而非执行预先设定的决策。这意味着人工智能智能体不仅能更快地获取信息,而且随着执行任务时间的延长,其智能水平也可能不断提高。未来将会给出答案。

For agentic AI systems, this could be a massive leap forward. With such models, agents could actively learn, adapt, and optimize their actions in real-time instead of executing pre-programmed decisions. This means AI agents would not just retrieve information faster but potentially become more intelligent the longer they work on a task. The future will tell us.

图像

图 6.1:两种缩放规律的示意图:训练时计算和测试时计算(来源:© OpenAI)

Figure 6.1: Illustration of the two scaling laws: train-time and test-time compute (Source: © OpenAI)

搭建实验舞台

Setting Up the Stage of Our Experiment

为了评估低层次模型(LLM)和高层次模型(LRM)在能力上的差异,我们需要一种方法来观察这些系统解决复杂问题的过程。经过多次讨论,我们意识到填字游戏可以提供一个理想的实验环境。它独特地结合了简单的独立任务和复杂的整合——这正是我们测试不同类型模型思维所需要的。

To assess the differences in capabilities between LLMs and LRMs, we needed a way to observe these systems solving complex problems. After much discussion, we realized that crossword puzzles could provide the perfect laboratory. They offer a unique combination of simple individual tasks that require complex integration—exactly what we needed to test different types of model thinking.

我们设计了一个包含相互关联约束条件的填字游戏,需要仔细推理才能正确解答。你可以把它想象成一个数独游戏,改变一个数字会影响到其他许多数字——只不过在这里,我们要处理的是一些必须在多个方向上都能正确组合的单词,而且这些单词还要在语义上保持一致。

We designed a crossword puzzle with interconnected constraints that required careful reasoning to solve correctly. Think of it like a Sudoku puzzle where changing one number affects many others—except here, we’re dealing with words that must fit together in multiple directions while making semantic sense.

谜题中包含一些看似简单的线索——比如说出一位R&B歌手的名字(玛丽·布莱姬),指出一位戴着高顶礼帽的吉他手(Slash),或者说出电影《皆大欢喜》中的一片森林。但真正的挑战在于我们设置的种种限制,这些限制构成了一张错综复杂的相互依存之网。例如,“横向第13题的第四个字母也是……”6 下”和“1 横的首字母也是 1 下”这两个限制条件意味着每个答案不仅要符合自身的线索,还要作为一个相互关联的整体的一部分。

The puzzle included clues that seemed straightforward enough—naming an R&B singer (Mary J. Blige), identifying a guitarist with a top hat (Slash), or naming a forest from “As You Like It.” But the real challenge was in the constraints we built into the puzzle, which created a web of interdependencies. For instance, “The fourth letter of 13 Across is also the fourth letter of 6 Down,” and “The first letter of 1 Across is also the first letter of 1 Down.” These constraints meant that each answer had to work not just for its own clue, but as part of an interconnected whole.

图像

图 6.2:实验中提供给 LLM 和 LRM 的填字游戏(来源:© Bornet 等人)

Figure 6.2: The crossword given to the LLM and the LRM for the experiment (Source: © Bornet et al.)

此外,以下是我们给模型的指令:

Besides, here are the instructions we gave to the models:

请根据以下线索解开这个填字游戏:

“Solve this crossword using these clues:

穿过

ACROSS

1. ___ Blige(R&B歌手)

1. ___ Blige (R&B singer)

6. 这真让人发笑

6. It’s a laugh

9. 牛仔比赛

9. Contest for cowboys

10. 之前,在颂歌中

10. Before, in odes

11. 一位头戴高礼帽、只有一个名字的吉他手

11. One-named guitarist with a top hat

12. 揭丑记者塔贝尔

12. Muckraker Tarbell

13.《守望者》导演扎克

13. “Watchmen” director Zack

15. 有喝酒、音乐的地方

15. Spot with drinking, music

向下

DOWN

1. 妻子,带“the”

1. Wife, with “the”

2. 带有三角形标志的互联网服务提供商

2. ISP with a triangular logo

3. 食物无花果。

3. Food fig.

6. 阿尔卑斯山的小女孩

6. Little girl of the Alps

7.《皆大欢喜》森林

7. “As You like It” forest

8. 情人节象征”

8. Valentine’s Day symbol”

令人惊讶的结果

The Surprising Results

结果令人震惊。LLM 几乎立即做出反应,但却出现了五个重大错误。相比之下,LRM 花了两分多钟才做出反应,但却给出了近乎完美的解决方案。让我们一起来回顾一下我们的实验,以及我们可以从中学到什么。

The results were striking. The LLM responded almost instantly but made five significant errors. In comparison, the LRM took over two minutes to respond but produced a nearly perfect solution. Let us walk through our experiment and what we can learn from it.

LLM实验

The Experiment with the LLM

当我们第一次把这个谜题交给LLM(GPT-4o)时,它的解题思路让我们印象深刻。这让我们想起一个才华横溢却过于自信的学生应试——不先理解整体情况,就直接切入正题。短短几秒钟,它就填出了大部分线索的答案。乍一看,这似乎令人印象深刻。但当我们开始检查结果时,兴奋之情转为担忧。我们发现了五处重大错误,全部都出现在需要正确连接单词的地方。

When we first presented the puzzle to the LLM (GPT-4o), we were struck by its approach. It reminded us of watching a brilliant but overconfident student tackle an exam—diving in immediately without taking the time to understand the full picture. Within seconds, it had filled in answers for most clues. At first glance, this seemed impressive. But as we began checking the results, our excitement turned to concern. We found five significant errors, all in places where words needed to interconnect correctly.

图像

(红色部分为错误)

图 6.3:LLM 给出的回答(来源:© Bornet 等人)

(in red are the mistakes)

Figure 6.3: The responses given by the LLM (Source: © Bornet et al.)

观察到的这种行为与我们发现的LLM在处理复杂、相互关联的约束条件时存在的根本局限性直接相关。让我们来看看当我们向LLM呈现填字游戏时发生了什么:

This observed behavior ties directly into what we discovered about LLMs’ fundamental limitations when processing complex, interconnected constraints. Let’s examine what happened when we presented the LLM with the crossword puzzle:

最引人注目的是它的响应模式——迅速、自信,但存在根本性的缺陷。正如学生可能在完全理解问题之前就急于作答一样,LLM 也表现出我们所说的“过早下结论”——在充分处理所有限制条件之前就匆忙得出结论。<sup>97</sup>这种行为源于 LLM 的基本运作方式:它们主要以顺序方式处理信息,并根据训练数据中观察到的模式进行预测。<sup> 98</sup>

The most striking aspect was its response pattern—immediate, confident, but fundamentally flawed. Just as a student might rush into answering before fully understanding the question, the LLM displayed what we call “premature closure”—jumping to conclusions before fully processing all the constraints.97 This behavior stems from how LLMs fundamentally operate: they process information in a primarily sequential manner, making predictions based on patterns they’ve seen in their training data.98

这项实验最引人注目之处在于通过观察模型的反应来了解其注意力模式。当我们分析其输出时,我们注意到它会:

What made this experiment particularly revealing was watching the model’s attention patterns through its responses. When we analyzed its outputs, we noticed it would:

1.集中精力关注眼前的线索。

1. Focus intensely on the immediate clue at hand

2.生成一个看似合乎逻辑的答案

2. Generate a seemingly logical answer

3.未充分检查交叉约束条件,就直接跳到下一个线索。

3. Move on to the next clue without sufficiently checking intersecting constraints

了解其局限性

Understanding the limitations

这种方法揭示了三个关键局限性,有助于我们理解LLM以及人工智能代理未来面临的挑战:

This approach revealed three critical limitations that help us understand both LLMs and the future challenges for AI agents:

首先是“语境碎片化”问题。人类自然而然地能够同时考虑多个约束条件——例如,思考一个词如何既符合线索又能与其他交叉词正确衔接——而LLM(逻辑记忆模型)却难以保持这种整体性视角。这种局限性源于LLM处理信息的方式,即其注意力机制虽然强大,但并不能真正复制人类的工作记忆。

First, there’s the “context fragmentation” problem. While humans naturally hold multiple constraints in mind simultaneously—thinking about how a word must fit both its clue and intersect correctly with crossing words—the LLM struggled to maintain this holistic view. This limitation stems from how LLMs process information through their attention mechanisms, which, while powerful, don’t truly replicate human working memory.

其次,我们观察到一种我们称之为“虚假自信综合症”的现象。该模型会以与正确答案相同的高置信度给出错误答案,无法区分不同程度的确定性。这与牛津大学最近的研究结果相吻合,该研究发现,法学硕士在面对模糊或约束较多的问题时,往往会做出过度自信的反应。99

Second, we observed what we call “false confidence syndrome.” The model would provide incorrect answers with the same high confidence as correct ones, failing to distinguish between different levels of certainty. This mirrors findings from recent research at Oxford about LLMs’ tendency toward overconfident responses when faced with ambiguous or constraint-heavy problems.99

最能说明问题的是,该模型有时甚至违反了最基本的约束条件——例如单词中的字母数量。这揭示了逻辑学习模型在处理显式规则和学习模式方面存在根本性的局限性。虽然它们擅长从训练数据中进行模式匹配,但却难以应对需要精确遵守的、基于规则的严格约束。

Most tellingly, the model sometimes violated even the most basic constraints—like the number of letters in a word. This revealed a fundamental limitation in how LLMs handle explicit rules versus learned patterns. While they excel at pattern matching from their training data, they struggle with rigid, rule-based constraints that require precise adherence.

当我们改进方法,将拼图分解成更小的组件时,模型的性能有所提升,但仍然无法达到人类的准确度。这与研究结果相符,即虽然逻辑学习模型(LLM)在许多任务中都能发挥强大的作用,但它们本质上缺乏……人类用于解决复杂难题的工作记忆和约束满足能力。100

When we modified our approach and broke down the puzzle into smaller components, the model’s performance improved somewhat, but it still couldn’t match human-level accuracy. This aligns with research showing that while LLMs can be powerful tools for many tasks, they fundamentally lack the type of working memory and constraint satisfaction capabilities that humans use for complex puzzles.100

这些局限性凸显了我们在人工智能代理工作中的一个重要洞见:我们需要构建能够同时维护和处理多种约束条件,并能准确评估自身置信度的系统。这不仅仅是增强模型功能的问题,更是开发能够处理许多现实世界任务所需的那种相互关联、基于规则的推理的新架构的问题。

These limitations highlighted a crucial insight for our work with AI agents: the need to build systems that can maintain and work with multiple constraints simultaneously while also accurately assessing their own certainty levels. It’s not just about making models more powerful—it’s about developing new architectures that can handle the type of interconnected, rule-based reasoning that many real-world tasks require.

我们用填字游戏进行的实验完美地体现了我们在开发更强大的AI智能体时所面临的挑战。它表明,尽管当前的AI系统在模式识别方面已经非常出色,但它们仍然缺乏一些人类习以为常的基本认知能力——例如在解决问题时同时考虑多个约束条件。

Our experiment with the crossword puzzle serves as a perfect microcosm of the challenges we face in developing more capable AI agents. It shows that while current AI systems can be impressively sophisticated in their pattern recognition, they still lack some of the fundamental cognitive capabilities that humans take for granted—like holding multiple constraints in mind while working toward a solution.

模式匹配挑战

The pattern-matching challenge

为了理解我们的填字游戏实验为何揭示了这些局限性,我们需要揭开这些人工智能系统内部运作的神秘面纱。尽管语言学习模型(LLM)的输出结果令人印象深刻,有时甚至近乎神奇,但它们的“思考”方式与人类截然不同。相反,它们执行的是通常所说的“下一个词元预测”——本质上是根据训练过程中学习到的模式来猜测下一个词或符号应该是什么。

To understand why our crossword puzzle experiment revealed these limitations, we need to demystify what’s actually happening inside these AI systems. Despite their impressive outputs that can sometimes seem almost magical, LLMs don’t “think” in the way humans do. Instead, they perform what is commonly called “next-token prediction”—essentially guessing what word or symbol should come next based on patterns they’ve learned during training.

你可以把它想象成一个极其复杂的自动补全系统。就像你的手机会在你输入信息时建议下一个词一样,LLM(逻辑推理模型)会根据之前看到的内容不断预测接下来应该输入什么文本。关键区别在于这些预测的规模和复杂程度。

Think of it like an extremely sophisticated autocomplete system. Just as your phone might suggest the next word when you’re typing a message, an LLM is constantly predicting what text should follow based on what it has seen before. The key difference is the scale and sophistication of these predictions.

从最基本的层面来说,逻辑推理模型(LLM)通过复杂的模式匹配来执行研究人员所谓的“表面推理”。它们遇到问题(例如我们的填字游戏)时,首先会尝试将其与训练数据中已见过的类似模式进行匹配。这有助于解释为什么模型能够为单个线索生成合理的答案(因为它在训练过程中见过许多类似的问答对),但却难以处理相互关联的约束(因为它很少见到同时处理多个交叉词语要求的例子)。

At their most basic level, LLMs perform what researchers call “surface-level reasoning” through sophisticated pattern matching.101 When they encounter a problem—like our crossword puzzle—they first try to match it with similar patterns they’ve seen in their training data. This helps explain why the model could generate plausible answers for individual clues (it had seen many similar question-answer pairs in its training) but struggled with the interconnected constraints (it had rarely seen examples of managing multiple intersecting word requirements simultaneously).

当我们尝试通过将拼图分解成更小的组成部分来帮助模型时,LLM 的这种模式匹配特性就更加明显了。即使我们明确地设置了约束条件,模型仍然难以解决问题,因为它本质上是在试图匹配模式,而不是真正地推理拼图不同部分之间的关​​系。

This pattern-matching nature of LLMs became even more apparent when we tried to help the model by breaking down the puzzle into smaller components. Even though we made the constraints explicit, the model still struggled because it was fundamentally trying to match patterns rather than truly reason about the relationships between different parts of the puzzle.

LRM实验

The Experiment with the LRM

接下来轮到LRM了。它的方法截然不同。它耗时两分多钟——在计算领域,这简直是漫长的岁月——但却得出了一个近乎完美的解决方案。它的计算过程有条不紊,深思熟虑。

Then came the LRM’s turn. Its approach was radically different. It took over two minutes—an eternity in computational terms—but produced a nearly perfect solution. Its process was methodical and deliberate.

最引人入胜之处在于,LRM能够实时可视化地展示其思考过程,一步步地展现其解题思路。这让我们得以清晰而深刻地了解人工智能系统如何分解并解决复杂问题。

What made this particularly fascinating was that the LRM visually displayed its thought process in real time, showing its reasoning step by step as it worked through the puzzle. This provided a clear and insightful glimpse into how an AI system breaks down and solves complex problems.

图像

图 6.4:LRM 给出的答案大多是正确的(来源:© Bornet 等人)

Figure 6.4: The responses given by the LRM are mostly correct (Source: © Bornet et al.)

战略准备:有效解决问题的基础

Strategic Preparation: The Foundation of Effective Problem-Solving

从一开始,我们就看到了截然不同的方法。屏幕上出现的第一条信息是:“正在整理选项。我正在梳理纵横字谜的线索,列出横向和纵向的选项,并确定每个选项的字母数量。这有助于更高效地指导解题过程。”这种有条不紊的准备阶段与认知科学家长期以来在人类问题解决专家身上观察到的现象相吻合——即在尝试解决问题之前,他们倾向于构建问题空间的心理模型。

From the very beginning, we could see a radically different approach. The first message that appeared on our screen read: “Laying out the options. I’m mapping out the crossword puzzle clues, listing both the across and down entries, and determining the letter count for each. This helps guide the solving process more efficiently.” This methodical preparation phase aligns with what cognitive scientists have long observed in expert human problem-solvers—the tendency to build a mental model of the problem space before attempting solutions.

赫伯特·西蒙和艾伦·纽厄尔在他们的著作《人类问题解决》中指出,问题解决专家通常会比新手花费更多时间来理解问题,然后再尝试寻找解决方案。<sup>102</sup>这个“准备阶段”不仅仅是收集信息,它还涉及构建心理学家所说的“问题空间表征”,即一个能够捕捉问题显性和隐性约束的心理模型。

In their work “Human Problem Solving,” Herbert Simon and Allen Newell demonstrated that expert problem solvers typically spend more time than novices in understanding a problem before attempting solutions.102 This “preparation phase” isn’t just about gathering information—it’s about building what psychologists call a “problem space representation,” a mental model that captures both the explicit and implicit constraints of the problem.

这与人类专业知识的运用如出一辙。试想一位国际象棋特级大师分析一个复杂的棋局。虽然他们可能对下一步棋有即时的直觉(系统1),但最优秀的棋手仍然会花时间通过仔细分析来验证自己的直觉(系统2)。同样,在复杂的商业决策中,即时的模式匹配可能很危险——优秀的管理者懂得何时应该放慢脚步,系统地推敲各种影响。

This mirrors what we see in human expertise. Consider a chess grandmaster analyzing a complex position. While they might have an immediate intuition about a move (System 1), the best players will still take time to verify their intuition through careful analysis (System 2). Similarly, in complex business decisions, instant pattern-matching can be dangerous—the best executives know when to slow down and reason through implications systematically.

这对企业意味着什么

What This Means for Businesses

我们对LRM系统严谨的准备阶段的观察,揭示了在商业环境中实施人工智能代理的重要经验。正如LRM在尝试解决方案之前首先绘制出整个问题空间图一样,企业也需要构建能够收集和分析上下文信息的AI系统,然后再采取行动。这不仅仅是收集数据,而是要全面了解决策环境。

Our observations of the LRM’s methodical preparation phase revealed important lessons for implementing AI agents in business settings. Just as the LRM began by mapping the entire problem space before attempting solutions, organizations need to build AI systems that gather and analyze contextual information before taking action. This isn’t just about collecting data—it’s about creating a comprehensive understanding of the decision environment.

想想国际象棋大师在落子前是如何研究棋局的。同样,人工智能系统也需要结构化的方法来理解其运行环境。对企业而言,这意味着要投资建设强大的数据基础设施,使人工智能代理能够访问和处理全面的上下文信息。这也意味着在做出关键决策之前,需要制定清晰的信息收集和验证流程。

Consider how a chess grandmaster studies a position before making a move. Similarly, AI systems need structured approaches to understand their operational context. For businesses, this means investing in a robust data infrastructure that enables AI agents to access and process comprehensive contextual information. It also means developing clear protocols for information gathering and validation before critical decisions are made.

一家制造企业客户通过为其人工智能代理创建所谓的“决策上下文图”来实施这一原则。在做出生产路线决策之前,他们的系统需要收集有关资源可用性、维护计划、工人轮班以及下游依赖关系的数据。这种全面的准备阶段有助于防止那些经常困扰不够完善的系统的错误连锁反应。

One manufacturing client implemented this principle by creating what they called “decision context maps” for their AI agents. Before making production routing decisions, their system was required to gather data about resource availability, maintenance schedules, worker shifts, and downstream dependencies. This comprehensive preparation phase helped prevent the kinds of cascading errors that often plague less thorough systems.

准备的力量

The Power of Preparation

就像 LRM 需要时间理解填字游戏的全部内容后再尝试解答一样,当你帮助 AI 代理理解你请求的完整上下文时,它们也能给出更好的结果。这就像给新团队成员做入职培训一样——他们掌握的背景信息越多,工作效率就越高。

Just as the LRM took time to understand the full scope of the crossword puzzle before attempting solutions, you’ll get better results from AI agents when you help them understand the complete context of your request. Think of it like briefing a new team member—the more context they have, the better their work will be.

例如,在使用人工智能代理撰写商业报告时,不要仅仅要求它“进行市场分析”。相反,要提供行业背景信息、您关注的具体竞争对手、您注意到的特定趋势,以及您计划如何利用这些信息。这些背景信息有助于人工智能代理恰当地构建分析框架,并提供更具针对性的见解。

For example, when working with an AI agent on a business report, don’t just ask for “a market analysis.” Instead, provide context about your industry, specific competitors you’re concerned about, particular trends you’ve noticed, and how you plan to use the information. This contextual information helps the AI agent frame its analysis appropriately and provide more relevant insights.

我们合作过的一位高管通过开发她所谓的“情境模板”,显著提升了她与人工智能代理的互动效果。每次互动前,她都会概述背景、限制条件和预期结果。“这就像会前准备一份简报,”她解释说,“我多花两分钟解释背景,就能省去之后几个小时的反复沟通。”

One executive we worked with dramatically improved her results with AI agents by developing what she called a “context template.” Before each interaction, she would outline the background, constraints, and desired outcomes. “It’s like having a pre-meeting brief,” she explained. “The extra two minutes I spend explaining the context saves hours of back-and-forth later.”

假设检验与验证:一种系统性的不确定性处理方法

Hypothesis Testing and Validation: A Systematic Approach to Uncertainty

LRM推理中最引人入胜之处或许在于其系统性地测试和验证潜在答案的方法。我们从LRM的这段话中便可看出这一点:“我正在考虑‘酒馆’、‘歌舞表演’、‘鸡尾酒’、‘舞厅’、‘迪斯科舞厅’、‘酒吧’和‘夜总会’作为答案,但6D的答案‘海蒂’与之冲突。”

Perhaps the most fascinating aspect of the LRM’s reasoning was its systematic approach to testing and validating potential solutions. We observed this when the LRM wrote: “I’m weighing ‘TAPROOMS’, ‘CABARET’, ‘COCKTAIL’, ‘DANCE CLUB’, ‘DISCOTHEQUE’, ‘BARROOMS’, and ‘NIGHTCLUB’ as answers, but face a conflict with ‘HEIDI’ for 6D.”

传统人工智能系统处理信息的方式就像火车在固定轨道上运行一样——它们遵循预先设定的路径从输入到输出。相比之下,LRM 的运行方式更像汽车,可以根据交通状况选择路线。当遇到问题时,它会创建一个动态的潜在路径网络,信息可以通过这些路径流动。

Traditional AI systems process information like trains running on fixed tracks—they follow predetermined pathways from input to output. The LRM, in contrast, operates more like a car that can choose its route based on traffic conditions. When it encounters a problem, it creates a dynamic network of potential pathways through which information can flow.

这个过程与认知科学家所说的人类问题解决中的“生成与测试”策略非常相似。帕特·兰利和赫伯特·西蒙的研究表明,成功的问题解决者如何运用他们所谓的“生成与测试”策略——创建多个潜在解决方案,并根据已知的约束条件系统地评估它们。<sup> 103</sup>

This process closely resembles what cognitive scientists call the “generate and test” strategy in human problem-solving. Research by Pat Langley and Herbert Simon demonstrated how successful problem solvers use what they called the “generate and test” strategy—creating multiple potential solutions and systematically evaluating them against known constraints.103

LRM 的方法揭示了系统性假设检验的本质,这项技能使其在人工智能领域脱颖而出。面对不确定性,它并没有固守单一解决方案,而是生成了多个备选方案,这与人类创造性解决问题时所展现的“发散性思维”相呼应。<sup>104</sup>随后,它仔细地将每一种可能性与任务的约束条件进行比对,例如,它写道:“正在检查字母对齐情况。好的,让我看看:横向 12 的第一个字母与纵向 6 的第三个字母匹配。”

The LRM’s approach revealed the essence of systematic hypothesis testing, a skill that sets it apart in the AI landscape. When faced with uncertainty, it didn’t lock onto a single solution but instead generated multiple alternatives, echoing the “divergent thinking” seen in human creative problem-solving.104 It then meticulously checked each possibility against the constraints of the task, demonstrating this when it wrote, “Checking letter alignment. OK, let me see: the first letter of 12 Across matches the third letter of 6 Down.”

然而,LRM真正令人瞩目的是其不断改进的能力。它并不固守最初的想法,而是持续演进,从每一步中学习。这种创造力、精确性和适应性之间的动态互动,在我们看来,展现了人工智能推理领域的新前沿。

What made the LRM truly remarkable, however, was its capacity for progressive refinement. It didn’t cling to initial ideas but continuously evolved its approach, learning from every step it took. This dynamic interplay of creativity, precision, and adaptability showcased, in our view, a new frontier in AI reasoning.

这对企业意味着什么

What This Means for Businesses

LRM 生成和测试多种潜在解决方案的方法为人工智能实施提供了至关重要的见解。对于实施人工智能代理的企业而言,这意味着设计能够生成和评估多种解决方案路径的系统。

the LRM’s approach to generating and testing multiple potential solutions offers crucial insights for AI implementation. For businesses implementing AI agents, this means designing systems that can generate and evaluate multiple solution pathways.

我们合作过的一家金融服务公司将这一原则应用于他们的交易算法。他们的AI代理没有采用单一的交易策略,而是开发了多种方法,并利用历史数据和当前市场状况进行测试。这种多假设方法带来了更稳健的决策和更好的风险管理。

A financial services firm we worked with applied this principle to their trading algorithms. Instead of pursuing a single trading strategy, their AI agents developed multiple approaches and tested them against historical data and current market conditions. This multi-hypothesis approach led to more robust decision-making and better risk management.

关键在于创建能够从成功和失败中学习的系统。每一次尝试,无论成功与否,都能提供宝贵的数据,从而改进未来的决策。组织应建立清晰的流程来收集和分析这些信息,从而创建一个持续学习的循环,随着时间的推移不断提升人工智能的性能。

The key is creating systems that can learn from both successes and failures. Each attempted solution, whether successful or not, provides valuable data that can improve future decision-making. Organizations should establish clear processes for capturing and analyzing this information, creating a continuous learning loop that enhances AI performance over time.

元认知意识:知道自己在思考的人工智能

Metacognitive Awareness: The AI That Knows It’s Thinking

LRM推理中最精妙的方面之一是它展现出的元认知意识——即思考自身思维过程的能力。我们观察到,LRM在以下语句中体现了这一点:“进步需要解决这些差异”以及“这表明说明可能存在印刷错误或不够清晰”。

One of the most sophisticated aspects of the LRM’s reasoning was its demonstration of metacognitive awareness—the ability to think about its own thinking process. We observed this when the LRM wrote: “Progress needs to reconcile these discrepancies” and “This suggests the instructions might be misprinted or lack clarity.”

这种元认知能力一直是人工智能发展中的圣杯。麻省理工学院最近的研究表明,能够监控和调整自身推理过程的系统,在处理复杂任务时通常比那些仅仅执行预设算法的系统表现更好。<sup> 105</sup>

This metacognitive capability has been a holy grail in AI development. Recent research from MIT has shown that systems capable of monitoring and adjusting their own reasoning processes often perform better on complex tasks than those that simply execute predetermined algorithms.105

LRM的元认知能力令人惊叹,它展现了一种能够以近乎人类的方式反思自身思维的人工智能。它能够主动监控自身的进展。既承认成功也承认挫折,例如:“进展稳步推进,但字母 E 存在冲突,促使我们重新思考 12A 的答案。”当最初的策略失败时,LRM 并没有盲目地加倍投入;而是进行了调整,并表示:“仔细研究。我正在测试 12 横向的第三个字母是否也是 6 纵向的第二个字母。”

The LRM’s metacognitive awareness was a revelation, showcasing an AI capable of reflecting on its own thinking in ways that felt almost human. It actively monitored its progress, acknowledging both successes and setbacks with statements like, “Progress is steady, but there’s a conflict with the letter E, prompting a rethink for 12A.” When an initial strategy fell short, the LRM didn’t double down blindly; it adapted, stating, “Taking a closer look. I’m testing the possibility that the third letter of 12 Across is also the second letter of 6 Down.”

最引人注目的是它识别不确定性的能力,这是高级推理的关键特征。通过识别自身可能正在使用不完整或有缺陷的信息,LRM不仅展现了智能,还展现了一种自我意识的萌芽——这对于应对现实世界复杂问题的AI智能体而言是一次至关重要的飞跃。

Most striking of all was its ability to recognize uncertainty, an essential trait of advanced reasoning. By acknowledging when it might be operating with incomplete or flawed information, the LRM demonstrated not just intelligence but the beginnings of a kind of self-awareness—a critical leap for AI agents navigating the complexities of real-world problems.

这种元认知能力——即思考思考本身——对商业人工智能的应用具有深远的影响。当逻辑回归模型(LRM)遇到不确定性或潜在错误时,它不会盲目行事,而是会承认自身的局限性并调整方法。

This metacognitive capability—thinking about thinking—has profound implications for business AI implementations. When the LRM encountered uncertainty or potential errors, it didn’t blindly proceed but instead acknowledged its limitations and adapted its approach.

对机构而言,这意味着要开发能够评估自身置信水平并识别何时超出其能力范围的人工智能系统。我们曾为一家医疗机构提供咨询,该机构在其诊断人工智能系统中应用了这一原则。他们的智能体旨在为每次诊断提供置信度评分,更重要的是,能够识别出哪些病例超出了其可靠决策的参数范围。

For organizations, this means developing AI systems that can assess their own confidence levels and recognize when they’re operating at the edges of their capabilities. A healthcare organization we advised implemented this principle in their diagnostic AI systems. Their agents were designed to provide confidence scores with each diagnosis and, crucially, to identify when cases fell outside their reliable decision-making parameters.

这种自我认知应延伸至组织层面。企业需要制定清晰的流程,明确人工智能决策何时需要人工审核、如何记录决策过程,以及如何根据绩效反馈调整策略。

This self-awareness should extend to the organizational level. Businesses need clear protocols for when AI decisions require human review, how to document decision-making processes, and how to adjust strategies based on performance feedback.

理解不确定性和局限性

Understanding Uncertainty and Limits

LRM识别并传达其不确定性的能力,或许为与AI代理合作提供了最关键的一课。作为用户,你需要培养一种判断何时应该信任他人的能力。人工智能的输出结果以及何时需要寻求额外的验证或人工专家意见。

The LRM’s ability to recognize and communicate its uncertainty provides perhaps the most crucial lesson for working with AI agents. As a user, you need to develop a sense of when to trust AI outputs and when to seek additional verification or human expertise.

我们曾咨询过的一位金融分析师开发了一种他称之为“置信度检查”的有效方法。在使用人工智能代理构建金融模型时,他会要求它们解释对分析不同部分的置信度,并指出哪些方面可能需要人工验证。“这就像和初级分析师合作一样,”他解释说,“你需要了解它们知道什么,也需要了解它们不知道什么。”

A financial analyst we advised developed an effective approach he calls “confidence checking.” When working with AI agents on financial models, he asks them to explain their level of confidence in different parts of the analysis and to identify which aspects might need human verification. “It’s like working with a junior analyst,” he explains. “You need to know both what they know and what they don’t know.”

这种意识也体现在你如何提出请求上。与其直接询问确切答案,不如学习询问背后的逻辑和潜在的不确定因素。诸如“哪些因素可能会降低这项建议的可靠性?”或“哪些额外信息有助于使这项分析更加严谨?”之类的问题,往往能带来更周全、更可靠的结果。

This awareness extends to how you frame your requests. Rather than asking for definitive answers, learn to ask for explanations of reasoning and potential areas of uncertainty. Questions like “What factors might make this recommendation less reliable?” or “What additional information would help make this analysis more robust?” can lead to more thoughtful and reliable outcomes.

我们从实验中得出的初步结论

Our first conclusions from the experiment

“真正考验智力的不仅仅是得到正确答案,而是你如何得到答案。” 汤姆是我们的合作作者之一,他曾参与过数十个人工智能项目的实施。他的这番话完美地概括了我们最近使用 LRM 进行的实验中所学到的东西。

“The true test of intelligence isn’t just getting the right answer—it’s how you get there.” This insight from Tom, one of our co-authors who has supported dozens of AI implementations, perfectly captures what we learned from our recent experiment with the LRM.

我们观察到的现象的意义远不止于填字游戏。快速模式匹配人工智能在处理简单、独立的决策时可能令人印象深刻。但是,在许多关键的商业应用中——无论是在医疗保健、金融还是物流领域——能否理清复杂的相互依存关系,往往决定着成败。LRM 的方法揭示了一个至关重要的道理:有时,花时间仔细思考不仅更好,而且至关重要。

The implications of what we were observing went far beyond crossword puzzles. Quick pattern-matching AI can look impressive when dealing with simple, independent decisions. But, in many critical business applications—whether in healthcare, finance, or logistics—the ability to think through complex interdependencies can be the difference between success and costly failure. The LRM’s approach demonstrated something crucial: sometimes, taking time to think carefully isn’t just better—it’s essential.

这项实验揭示了人工智能未来的一个根本性问题:它不仅仅是做出更快的决策,而是做出更好、更深思熟虑的决策。在一个世界里,我们越来越依赖人工智能,理解这种区别可能意味着人工智能系统是真正帮助我们的,还是只会加速我们犯错的。

This experiment revealed something fundamental about the future of AI: it’s not just about making faster decisions, but about making better, more carefully considered ones. In a world increasingly reliant on artificial intelligence, understanding this distinction could mean the difference between AI systems that truly help us and those that simply rush us toward mistakes at higher speeds.

众人拾柴火焰高:人工智能推理中的多智能体系统

The Power of Many: Multi-Agent Systems in AI Reasoning

虽然我们对 LRM 的实验突显了在单个 AI 系统中进行深思熟虑的推理的重要性,但一项更令人着迷的发现出现了:更好的推理往往不是来自给单个 AI 代理更多的时间,而是来自让多个 AI 代理一起推理。

While our experiments with the LRM highlighted the importance of deliberate reasoning in individual AI systems, an even more fascinating discovery has emerged: better reasoning often comes not from giving a single AI agent more time but from enabling multiple AI agents to reason together.

这一洞见从根本上改变了我们对人工智能的认知,挑战了“进步仅仅在于构建更强大的独立系统”这一假设。蒙特利尔大学Mila团队近期开展的研究表明,人工智能代理之间的协作推理甚至可以超越最先进的单个模型。

This insight fundamentally shifts how we perceive artificial intelligence, challenging the assumption that progress is solely about building more powerful standalone systems. Recent research conducted Mila, the University of Montreal,106 has demonstrated that collaborative reasoning among AI agents can outperform even the most advanced individual models.

这些发现揭示了解决关键挑战(例如速度与信任之间的权衡)的新可能性,同时解锁了依赖于更可靠和更细致的决策的创新商业应用。

These findings reveal new possibilities for addressing critical challenges, such as the speed-trust trade-off, while unlocking innovative business applications that rely on more reliable and nuanced decision-making.

尺度悖论

The Scale Paradox

当研究人员用较小的模型(有些模型甚至只有20亿个参数)测试他们的“辩论框架”时,他们发现了一些惊人的现象。即使是这些较小的模型,在参与辩论时,其推理能力也显著提高。在与不同背景的同伴进行的结构化辩论中。107关键不在于模型的大小,而在于它们建筑结构的多样性以及它们如何挑战彼此的思维。

When researchers tested their ‘debate framework’ with smaller models—some as small as 2 billion parameters—they found something remarkable. Even these smaller models showed significant improvements in reasoning capability when engaged in structured debate with diverse peers.107 The key wasn’t the size of the models, but rather their architectural diversity and the way they challenged each other’s thinking.

这一发现意义深远。它表明,复杂的推理能力并非仅仅源于强大的计算能力,而是可以通过恰当的互动和对话而涌现。不妨将其想象成一群参加研讨会的学生——尽管每个人的知识可能有限,但他们的集体讨论和辩论却能产生超越任何一位知识最渊博的学生独自思考所能获得的洞见。这种“互动涌现”为开发无需庞大计算资源即可高效推理的人工智能系统提供了新的可能性。

This finding has profound implications. It suggests that sophisticated reasoning isn’t just a product of raw computational power, but can emerge through the right kind of interaction and dialogue. Think of it like a group of students in a seminar—while each individual might have limited knowledge, their collective discourse and debate can lead to insights that surpass what even the most knowledgeable individual might achieve alone. This ‘emergence through interaction’ hints at new possibilities for developing AI systems that can reason effectively without requiring massive computational resources.

理解多智能体系统

Understanding Multi-Agent Systems

可以将多智能体系统想象成一个专家小组,他们正在讨论一个复杂的问题。每位专家都带来各自的视角和专业知识,通过结构化的辩论和讨论,他们往往能得出比任何个人单独得出的更好的结论。在人工智能领域,多智能体系统由多个协同工作的AI智能体组成,每个智能体都可能拥有不同的能力、训练或专业知识

Think of a multi-agent system as a panel of experts discussing a complex problem. Each expert brings their own perspective and expertise, and through structured debate and discussion, they often reach better conclusions than any individual could reach alone. In AI terms, a multi-agent system consists of multiple AI agents working together, each potentially having different capabilities, training, or specialized knowledge.108

蒙特利尔大学的研究有力地证明了这一原理。<sup>109</sup>当他们在一个结构化的辩论框架下将各种人工智能模型结合起来解决复杂问题时,他们发现了一些非凡的事情:一组中等容量的人工智能模型(包括 Gemini-Pro 和 Mixtral)。7B×8 和 PaLM 2-M 在复杂的数学问题上达到了 91% 的准确率,超越了当时最先进的人工智能系统之一 GPT-4。这并非仅仅是微小的改进,而是我们在理解人工智能推理能力方面取得的根本性突破。

Research from the University of Montreal demonstrated this principle dramatically.109 When they put diverse AI models together in a structured debate framework to solve complex problems, they discovered something remarkable: a group of medium-capacity AI models (including Gemini-Pro, Mixtral 7B×8, and PaLM 2-M) achieved 91% accuracy on complex mathematical problems, outperforming GPT-4, one of the most advanced individual AI systems at that time. This wasn’t just a minor improvement—it represented a fundamental breakthrough in how we think about AI reasoning capabilities.

图像

图 6.5:LRM 给出的答案大多是正确的(来源:改编自蒙特利尔大学的研究)

Figure 6.5: The responses given by the LRM are mostly correct (Source: adapted from the University of Montreal’s research)

多智能体系统为我们之前讨论的速度-信任困境提供了一个引人入胜的解决方案。通过让多个智能体并行工作,每个智能体可以以不同的速度和方式运行,这些系统能够结合快速思考和慢速思考的优势。一些智能体可以提供快速的、基于模式的响应,而另一些智能体则可以进行更深层次的推理,这与人类在群体决策中平衡直觉思维和分析思维的方式非常相似。

Multi-agent systems offer an intriguing solution to the speed-trust dilemma we discussed earlier. By having multiple agents work in parallel, each potentially operating at different speeds and with different approaches, these systems can combine the benefits of both fast and slow thinking. Some agents can provide quick, pattern-based responses while others engage in deeper reasoning, much like how humans balance intuitive and analytical thinking in group decision-making.

集体智慧的出现

The Emergence of Collective Intelligence

我们越深入研究多智能体系统,就越会发现它与人类群体动力学之间存在着令人惊讶的相似之处。蒙特利尔研究中最令人着迷的发现是,当单个人工智能模型信心十足但却犯错时,通过辩论引入不同的观点,往往会使它们在得出更好的答案之前适当地降低信心。

The deeper we look into multi-agent systems, the more we discover surprising parallels with human group dynamics. One of the fascinating findings from the Montreal research is that when individual AI models were highly confident but wrong, the introduction of different perspectives through debate often led them to appropriately reduce their confidence before arriving at better answers.

这反映了人类专家推理的一个关键方面——能够识别何时应该质疑那些看似确信的答案。例如,在医学诊断中,最危险的错误往往源于过早的确定性。正如多位医生会诊疑难病例可以带来更恰当的谨慎一样,人工智能体之间的讨论也能促进对不确定性更细致入微的处理。

This mirrors a crucial aspect of human expert reasoning—the ability to recognize when confident answers should be questioned. For instance, in medical diagnosis, the most dangerous errors often come from premature certainty. Just as having multiple doctors confer on a difficult case can lead to more appropriate caution, the debate between AI agents fosters a more nuanced handling of uncertainty.

更令人着迷的是,多智能体改进并非仅仅是优势的叠加,而是创造全新的事物。蒙特利尔的研究表明,当多个相同的AI模型相互辩论时,性能仅略有提升(从78%提升到82%)。然而,当不同的模型进行结构化辩论时,性能却取得了显著提升,在复杂的数学问题上达到了91%的准确率。

What makes this even more intriguing is that multi-agent improvements aren’t just about combining strengths—they’re about creating something entirely new. The Montreal research revealed that when multiple copies of the same AI model debated with each other, performance only improved modestly (from 78% to 82%). However, when diverse models engaged in structured debate, they achieved dramatic improvements, reaching 91% accuracy on complex mathematical problems.

这揭示了一个至关重要的洞见:人工智能系统和人类一样,也会陷入自身的“思维模式”中。只有当采用不同方法和架构训练的不同模型融合在一起时,我们才能看到真正的智力进步。

This reveals a crucial insight: AI systems, like humans, can get stuck in their own “thought patterns.” It’s only when different models, trained with different approaches and architectures, come together that we see true intellectual progress emerge.

认知多样性的力量

The Power of Cognitive Diversity

是什么让多智能体系统如此高效?关键在于研究人员所说的“思维多样性”。正如人类团队受益于认知多样性一样,人工智能系统在运用不同的问题解决方法时也能表​​现得更好。一个经过不同训练的系统,即使整体能力不如其他系统,或许能发现更复杂的系统所忽略的模式或可能性。

What makes multi-agent systems so effective? The key lies in what researchers call “diversity of thought.”110 Just as human teams benefit from cognitive diversity, AI systems perform better when they bring different approaches to problem-solving. A system trained differently, even if it’s not as powerful overall, might spot patterns or possibilities that a more sophisticated system misses.

蒙特利尔大学的研究强调了这一原则,该研究表明不同的AI智能体如何能够互补彼此的弱点。例如,由于它们可以使用的工具或可以获取的知识不同,一个智能体可能擅长模式识别,而另一个智能体则更擅长逻辑推理。当它们协同工作时,就会产生研究人员所说的涌现推理能力——这种能力并非任何单个智能体独有,而是在它们之间的交互中涌现出来的

This principle was highlighted in research from the University of Montreal, which demonstrated how different AI agents can complement each other’s weaknesses. For instance, due to the tools they can use or the knowledge they can access, one agent might excel at pattern recognition while another is better at logical deduction. When working together, they create what researchers call emergent reasoning capabilities- abilities that don’t exist in any individual agent but emerge from their interaction.111

多智能体系统的有效性不仅仅在于拥有多个模型,更在于它们如何随着时间的推移相互协作。研究揭示了所谓的“师生效应”。<sup> 112</sup>当研究人员将能力更强的AI模型与能力较弱的模型配对进行辩论时,发生了一件令人瞩目的事情。能力较弱的模型在推理能力方面表现出快速提升,其性能水平往往远远超出其通常的能力范围。结果,整个系统的性能得到了提升,这充分展现了结构化交互在多智能体系统中的强大作用。

The effectiveness of multi-agent systems isn’t just about having multiple models—it’s about how they engage with each other over time. The research revealed what they called the “teacher-student effect.”112 When researchers paired more capable AI models with less capable ones in debate scenarios, something remarkable happened. The less capable models showed rapid improvement in their reasoning abilities, often achieving performance levels far beyond their typical capabilities. As a result, the overall system performance was enhanced, illustrating the power of structured interactions in multi-agent systems.

商业影响

The Business Impact

对于企业而言,多智能体系统的影响是深远的。通过我们的咨询工作,我们发现多智能体系统在三个主要领域能够创造显著价值。

For businesses, the implications of multi-agent systems are profound. Through our consulting work, we’ve seen three primary areas where multi-agent systems create significant value.

多智能体系统最显著的优势之一在于其能够增强复杂环境下的决策能力。例如,在供应链管理或风险评估中,部署多个智能体可以让每个智能体从独特的角度分析问题,从而得出更全面的解决方案。微软的 Magentic-One 框架进一步强调了这一原则,该框架表明,与单智能体系统相比,多智能体系统在预测和缓解中断方面更为有效。<sup> 113</sup>我们在与一家全球制药公司的咨询工作中也证实了这些发现,实施多智能体方法进行供应链优化后,与中断相关的损失减少了 35%。试想一下,在一个供应链中,一个智能体预测需求激增,另一个智能体监控地缘政治风险,两者相互补充,共同构建一个具有韧性的战略。

One of the most striking advantages lies in their ability to enhance decision-making in complex environments. For example, in supply chain management or risk assessment, deploying multiple agents allows each to analyze the problem from a unique angle, leading to more comprehensive solutions. This principle is further emphasized by Microsoft’s Magentic-One framework, which revealed that multi-agent systems were more effective at predicting and mitigating disruptions compared to their single-agent counterparts.113 We’ve confirmed these findings in our own consulting work with a global pharmaceutical company, where implementing a multi-agent approach for supply chain optimization reduced disruption-related losses by 35%. Imagine a supply chain where one agent anticipates demand surges while another monitors geopolitical risks, each reinforcing the other’s insights to create a resilient strategy.

第二个突破在于其更强大的错误检测能力。与单一系统不同,多智能体框架善于通过辩论和挑战来提升效率,从而有效地识别推理中的缺陷。麻省理工学院和谷歌大脑的研究发现,当人工智能智能体被设计成能够质疑和完善彼此的结论时,错误率降低了22%以上。<sup> 114</sup>在我们为金融服务公司实施多智能体系统的工作中,我们持续观察到错误率的降低,这验证了上述研究结果。这种协作机制类似于人类的同行评审过程,不同的视角能够增强最终结果。对于企业而言,这意味着无论是在产品设计、财务预测还是运营物流方面,都能减少代价高昂的错误。

The second breakthrough lies in their capacity for improved error detection. Unlike a solitary system, multi-agent frameworks thrive on debate and challenge, effectively identifying flaws in reasoning. Research from MIT and Google Brain found that when AI agents were designed to question and refine each other’s conclusions, error rates were reduced by over 22%.114 In our work implementing multi-agent systems for financial services firms, we’ve consistently seen error reduction rates that validate these research findings. This collaborative dynamic mirrors the human process of peer review, where diverse perspectives strengthen the final outcome. For businesses, this means fewer costly mistakes, whether in product design, financial forecasting, or operational logistics.

或许最令人兴奋的是它们的适应能力。多智能体系统在面对新挑战时表现出非凡的韧性。挑战。最近,一个研究团队证明,由不同智能体组成的群体能够解决以前从未遇到过的问题,其性能通常优于依赖预训练单一模型的系统。<sup>115</sup>这种适应性使它们在动态市场或危机情况下发挥着不可估量的作用,因为在这些情况下,调整策略和创新能力是生存的关键。从优化营销策略到应对监管变化,多智能体系统展现出的稳健性几乎堪称进化而来。

Perhaps most exciting is their adaptability. Multi-agent systems exhibit remarkable resilience when faced with novel challenges. A research team recently demonstrated that diverse groups of agents could tackle previously unseen problems, often outperforming systems that relied on pre-trained, single models.115 This adaptability makes them invaluable in dynamic markets or crisis situations, where the ability to pivot and innovate is key to survival. From optimizing marketing strategies to navigating regulatory changes, multi-agent systems offer a robustness that feels almost evolutionary.

级联效应的挑战

The Challenge of Cascading Effects

尽管多智能体系统具有强大的优势,但也带来了独特的挑战,需要认真考虑。一些研究揭示了一个关键现象:在人工智能智能体网络中,推理错误并非简单地累积,而是通过我们称之为网络效应<sup> 116</sup>或错误复合影响<sup> 117</sup>而成倍增长。

While multi-agent systems offer powerful benefits, they also introduce unique challenges that demand careful consideration. A few studies revealed a critical phenomenon: in networks of AI agents, reasoning errors don’t simply accumulate—they multiply through what we can call a network effect116 or compound impact of errors.117

我们在与一家大型电信公司合作的过程中亲身经历了这种现象。该公司部署了一套复杂的智能体系统来管理其网络运营。该系统采用了多个专业化的AI智能体,每个智能体负责不同的方面,例如负载均衡、安全监控、资源分配、维护调度和用户体验优化。虽然每个智能体单独运行时都表现出色,但它们之间的相互连接却带来了意想不到的安全漏洞。

We witnessed this phenomenon firsthand during our work with a major telecommunications company that had implemented a sophisticated multi-agent system to manage their network operations. Their system employed multiple specialized AI agents, each responsible for different aspects like load balancing, security monitoring, resource allocation, maintenance scheduling, and user experience optimization. While each agent performed admirably in isolation, their interconnected nature created unexpected vulnerabilities.

在一个极具启发性的案例中,一个代理在评估网络容量时犯了一个看似微不足道的错误。这个看似微小的错误引发了一连串相互关联的决策:负载均衡系统误判了可用容量,导致资源分配代理做出次优决策。这反过来又导致维护调度代理重新安排关键更新,进而触发安全监控系统发出误报。最终,用户体验优化代理做出了适得其反的调整。最初看似微小的推理错误,通过这一系列相互关联的决策,最终演变成一次严重的服务中断。

In one particularly instructive incident, a single agent made what seemed like a minor error in assessing network capacity. This seemingly small mistake triggered a cascade of interdependent decisions: the load balancing system misinterpreted available capacity, leading the resource allocation agent to make suboptimal decisions. This, in turn, caused the maintenance scheduling agent to reschedule critical updates, which prompted the security monitoring system to flag false positives. The chain reaction culminated in the user experience optimization agent making counterproductive adjustments. What began as a small reasoning error amplified into a significant service disruption through this cascade of interrelated decisions.

这次经历让我们对关键应用中多智能体系统的管理有了宝贵的认识。通过跨行业的实施工作,我们建立了一个框架来防止此类级联故障。关键在于实施我们称之为“推理检查点”的机制——预先设定的关键点,在这些点上,关键决策需要经过多层验证。这些检查点与“断路器”协同工作,断路器是指在决策传播到整个系统之前触发人工验证的特定条件。我们将在第8章中详细阐述这些方面。

This experience taught us valuable lessons about managing multi-agent systems in critical applications. Through our implementation work across industries, we’ve implemented a framework for preventing such cascade failures. The key lies in implementing what we call “reasoning checkpoints”—predetermined points where critical decisions require multiple levels of validation. These checkpoints work alongside “circuit breakers,” specific conditions that trigger human validation before decisions can propagate through the system. We provide more detail on these aspects in Chapter 8.

基于这些经验,我们发现用户需要敏锐地意识到潜在的连锁反应。高效的用户会学习:

Drawing from this experience, we’ve found that users need to develop a keen awareness of potential cascade effects. Effective users learn to:

定期要求代理人解释他们对其他代理人输出的依赖关系

Regularly ask agents to explain their dependencies on other agents’ outputs

请求定期进行系统范围一致性检查

Request periodic system-wide consistency checks

设立明确的检查点,以便人工验证关键决策。

Set up explicit checkpoints for human validation of critical decisions

监控整个系统中是否存在错误放大的迹象

Monitor for signs of error amplification across the system

另一项关键保障措施是实施独立的验证协议,其中不同的AI系统相互交叉验证。运用不同的方法互相验证彼此的推理。这种方法有助于在潜在错误扩散到整个系统之前将其发现。此外,我们发现,能够实时追踪决策后续影响的强大反馈监控系统,对于及早发现潜在的级联故障至关重要。

Another crucial safeguard is the implementation of independent verification protocols, where separate AI systems cross-check each other’s reasoning using different methodologies. This approach helps catch potential errors before they can propagate through the system. Additionally, we’ve found that robust feedback monitoring systems, which track the downstream effects of decisions in real-time, are essential for early detection of potential cascade failures.

这些安全措施不仅能防止故障发生,还能提升多智能体系统的整体可靠性。通过精心管理智能体之间的交互并实施适当的制衡机制,组织既能充分发挥多智能体推理的强大功能,又能最大限度地降低级联故障的风险。这种平衡的方法已被证明对于在关键业务应用中成功部署多智能体系统至关重要。

These safeguards don’t just prevent failures—they also enhance the overall reliability of multi-agent systems. By carefully managing the interactions between agents and implementing appropriate checks and balances, organizations can harness the power of multi-agent reasoning while minimizing the risks of cascading failures. This balanced approach has proven essential for successful deployment of multi-agent systems in critical business applications.

展望未来:人工智能推理的下一个演进阶段

Looking to the Future: The Next Evolution in AI Reasoning

我们探索人工智能推理的旅程始于一个简单的填字游戏,却引领我们对人工智能的本质及其未来发展有了深刻的理解。回顾我们所学到的知识,我们对人工智能推理的理解出现了三个根本性的转变——这些转变将在未来几年重塑我们思考和应用人工智能的方式。

Our journey exploring AI reasoning began with a simple crossword puzzle but led us to profound insights into the nature of artificial intelligence and its future evolution. As we step back to consider what we’ve learned, three fundamental shifts in our understanding of AI reasoning emerge—shifts that will reshape how we think about and implement AI in the years ahead.

第一个转变挑战了我们对人工智能速度和性能的基本假设。在我们的实验和实施过程中,我们发现,有效的人工智能推理并非取决于强大的处理能力或瞬时响应,而是遵循一种自然的思维节奏——一种在快速模式识别和更深入、更审慎的分析之间交替的韵律。

The first shift challenges our basic assumptions about AI speed and performance. Throughout our experiments and implementations, we’ve discovered that effective AI reasoning isn’t about raw processing power or instantaneous responses. Instead, it follows a natural rhythm of thought—a cadence that alternates between quick pattern recognition and deeper, more deliberative analysis.

这种节奏反映了智能本身的一些基本特性。正如人类认知进化是为了平衡快速反应思维和较慢的分析推理一样,人工智能系统也开始发展出自己的认知节奏。LRM 在我们填字游戏实验中的成功并非偶然。关键在于它卓越的处理能力——它能够根据当前挑战的复杂性来调节思维速度。

This rhythm mirrors something fundamental about intelligence itself. Just as human cognition evolved to balance quick reactive thinking with slower analytical reasoning, AI systems are beginning to develop their own cognitive rhythms. The success of the LRM in our crossword experiment wasn’t just about its superior processing—it was about its ability to modulate its thinking speed based on the complexity of the challenge at hand.

我们对多智能体系统的探索揭示了第二个关键洞见:人工智能推理的未来或许不在于构建越来越庞大的个体模型,而在于促进不同认知方法之间的有效互动。这一原则——即即使单个视角存在局限性,多个视角也能结合起来产生更深刻的理解——似乎是所有智能系统(无论是人类还是人工智能)的普遍特征。

Our exploration of multi-agent systems revealed a second crucial insight: the future of AI reasoning may not lie in building ever-larger individual models but in fostering productive interaction between diverse cognitive approaches. This principle—that multiple perspectives, even if individually limited, can combine to create superior understanding—seems to be a universal feature of intelligent systems, whether human or artificial.

或许最令人惊讶的是,我们的研究使我们对人类在人工智能推理中的作用有了更深刻的理解。先进的人工智能系统非但没有使人类判断过时,反而似乎需要更复杂的人类监督和交互方式。我们所见到的最成功的应用并没有减少人类的参与——它们改变了人类的参与方式,将人类从单纯的操作者提升为我们所谓的“认知编舞者”,协调不同人工智能功能之间的交互,并确保它们与人类的价值观和目标保持一致。

Perhaps most surprisingly, our research has led us to a deeper appreciation of the human role in AI reasoning. Far from making human judgment obsolete, advanced AI systems seem to demand more sophisticated forms of human oversight and interaction. The most successful implementations we’ve seen don’t minimize human involvement—they transform it, elevating humans from mere operators to what we might call “cognitive choreographers,” orchestrating the interaction between different AI capabilities and ensuring their alignment with human values and objectives.

***

***

在我们探索人工智能代理如何进行复杂决策的过程中,一个问题不断涌现:这些系统如何从经验中学习?

As we’ve explored how AI agents reason through complex decisions, one question keeps emerging: How do these systems learn from their experiences?

推理能力使人工智能能够即时做出智能决策,而记忆则使其能够从这些经验中汲取养分,随着时间的推移变得更加智能。如果没有记忆,即使是最先进的推理能力也会永远停留在当下,无法从过去的成功中学习,也无法避免重蹈覆辙。

While reasoning enables AI to make intelligent decisions in the moment, memory allows it to build upon these experiences, growing smarter over time. Without memory, even the most sophisticated reasoning capabilities remain trapped in an eternal present, unable to learn from past successes or avoid repeated mistakes.

当我们转而探索记忆——人工智能代理的第三个关键要素——时,我们将发现各个组织是如何构建系统的。它们不仅会思考,还会学习和成长。未来的探索不仅将揭示人工智能的记忆机制,更将揭示它如何将这些系统从工具转变为助力企业成功的真正伙伴。

As we turn to explore memory—the third keystone of AI agents—we’ll discover how organizations are building systems that don’t just think but learn and grow. The journey ahead will reveal not just how memory works in AI, but how it transforms these systems from tools into true partners in business success.

第七章

CHAPTER 7

记忆:构建会学习的人工智能

MEMORY: BUILDING AI THAT LEARNS

想象一下,每天醒来都完全失忆——无法记起过去的经历、喜好或已习得的技能。你会如何生活?你会如何成长?这个思想实验直击当今人工智能领域最引人入胜的挑战之一:记忆。我们所使用的大多数生成式人工智能系统,每次使用时都相当于从头开始,这种“人工失忆”限制了它们的真正潜力。

Imagine starting each day with complete amnesia—unable to remember your past experiences, preferences, or learned skills. How would you function? How would you grow? This thought experiment cuts to the heart of one of the most fascinating challenges in artificial intelligence today: memory. Most generative AI systems we interact with essentially start fresh each time we use them, operating with a form of artificial amnesia that limits their true potential.

我们在与一家全球电信公司合作的过程中,对这种局限性有了最深刻的体会。该公司投入数百万美元开发先进的人工智能客服聊天机器人,但客户满意度却始终低迷。原因何在?人工智能会忘记之前与客户的互动,迫使客户反复重复他们的问题和偏好。正如该公司一位领导在项目进行期间所说:“这就像一个只有两分钟记忆力的客服代表。”

One of our most eye-opening experiences with this limitation came during our work with a global telecommunication company. They had invested millions in advanced AI customer service chatbots, yet customer satisfaction remained stubbornly low. The reason? The AI would forget previous interactions with customers, forcing them to repeat their issues and preferences repeatedly. As one of the company’s leaders noted during our project, “It’s like having a customer service representative with a two-minute memory span.”

这一挑战凸显了我们多年来在各组织机构部署人工智能系统过程中发现的一个关键真理:记忆不仅仅是智能的一个特征,它更是智能的基础。所有有意义的智能都建立在经验之上。无论是人类还是机器,保留、组织和利用过往经验的能力塑造了我们学习、适应和成长的方方面面。

This challenge highlights a crucial truth we’ve discovered through years of implementing AI systems across organizations: Memory isn’t just a feature of intelligence—it’s the foundation upon which all meaningful intelligence is built. Whether in humans or machines, the ability to retain, organize, and utilize past experiences shapes every aspect of how we learn, adapt, and grow.

在本章中,我们将带您探索人工智能记忆的奇妙世界,揭示这项基础能力如何变革商业和技术。您将了解不同类型的记忆——从短期处理到长期记忆——如何协同工作,构建真正智能的系统。通过真实案例和实际应用,我们将探讨如何构建不仅能存储信息,还能在每次交互中不断成长的人工智能系统。

In this chapter, we’ll take you on a journey through the fascinating world of AI memory, revealing how this fundamental capability is transforming business and technology. You’ll discover how different types of memory—from short-term processing to long-term retention—work together to create truly intelligent systems. Through real-world examples and practical implementations, we’ll explore how to build AI systems that don’t just store information but grow smarter with every interaction.

您将了解许多记忆实现方案失败的原因,以及避免这些陷阱的有效策略。我们将深入探讨记忆与遗忘之间的关键平衡,以及这对您的业务为何至关重要。接下来的旅程将挑战您对人工智能能力的固有认知,并向您展示支持记忆功能的系统如何重塑商业和技术的未来。

You’ll learn why many memory implementations fail and the proven strategies to avoid these pitfalls. We’ll delve into the critical balance between remembering and forgetting, and why this matters for your business. The journey ahead will challenge your assumptions about what AI can achieve and show you how memory-enabled systems are reshaping the future of business and technology.

记忆是智力的基础

Memory is a Foundation of Intelligence

想想你最早的童年记忆。也许是生日派对,奶奶厨房的香味,或是学会骑自行车的情景。现在问问自己:你真的记得这件事本身,还是记得你最后一次想起它时的情景?这个问题看似哲学,却直指人类记忆运作的本质——以及它与我们今天构建的人工记忆系统为何如此不同。

Think about your earliest childhood memory. Maybe it’s a birthday party, the smell of your grandmother’s kitchen, or learning to ride a bike. Now ask yourself: are you really remembering the original event, or are you remembering the last time you remembered it? This question, which might seem philosophical, cuts to the heart of how human memory actually works—and why it’s so different from the artificial memory systems we’re building today.

多年来,我们在各个组织机构部署人工智能系统的过程中发现,了解人类记忆有助于人们理解人工智能的潜力和局限性。智能。让我们探索一下我们自身的思维是如何运作的,因为它对人工智能的未来有着令人着迷的启示。

During our years of implementing AI systems across organizations, we’ve found that understanding human memory helps people grasp both the potential and limitations of artificial intelligence. Let’s explore how our own minds work, as it holds fascinating implications for the future of AI.

你的大脑是一个不可思议的信息处理器。此刻,当你阅读这些文字时,它正在处理大约1100万比特的信息,但你只能有意识地感知到其中的40到50比特。这种选择性意识揭示了我们大脑运作方式的一个关键要素:记忆并非存储所有信息,而是存储那些真正重要的信息

Your brain is an incredible information processor. Right now, as you read these words, it’s processing about 11 million bits of information, yet you’re only consciously aware of about 40 to 50 bits. This selective awareness reveals something crucial about how our minds work: memory isn’t about storing everything—it’s about storing what matters.118

想想你上次开车上班的情景。你可能记不清大部分路程,但如果发生了什么不寻常的事情,比如有鹿横穿马路,你肯定会记得清清楚楚。这并非人类记忆的缺陷,而是一种特性。我们的大脑非常擅长过滤掉日常信息,同时突出重要或不寻常的信息。

Think of the last time you drove to work. You probably don’t remember most of the journey, but you’d certainly remember if something unusual happened, like a deer crossing the road. This isn’t a bug in human memory; it’s a feature. Our brains are remarkably efficient at filtering out routine information while highlighting what’s important or unusual.

科学家发现我们拥有不同类型的记忆系统,每种系统都有其不同的用途。<sup>119</sup>其中就包括工作记忆(也称短期记忆)——相当于电脑的内存——它一次大约可以存储七条信息。这就是为什么传统的电话号码是七位数,也是为什么你可能难以记住一长串指令的原因。<sup> 120</sup>

Scientists have found that we have different types of memory systems, each serving different purposes.119 There’s working memory (also called short-term memory)—your mind’s equivalent of a computer’s RAM—which can hold about seven pieces of information at once. This is why phone numbers were traditionally seven digits long, and it’s why you might struggle to remember a long sequence of instructions.120

其次是长期记忆,它更为复杂,由三种主要类型的记忆构成。还记得你永远不会忘记如何骑自行车吗?这就是程序性记忆在起作用。你还能回忆起你的婚礼当天或你的第一份工作。面试?那是情景记忆。你知道巴黎是法国首都,即使你从未去过那里?那是语义记忆。121

Then, there’s long-term memory, which is far more complex and composed of three main types of memories. Remember how you never forget how to ride a bike? That’s procedural memory at work. The ability to recall your wedding day or your first job interview? That’s episodic memory. The fact that you know Paris is the capital of France, even if you’ve never been there? That’s semantic memory.121

最令人着迷的是这些系统是如何协同工作的。当你烹饪一道熟悉的菜肴时,你会同时运用程序性记忆(烹饪技巧)、语义记忆(知道哪些食材搭配在一起效果更好)和情景记忆(记住上次做这道菜的情况以及哪些步骤成功了,哪些步骤失败了)。

What’s fascinating is how these systems work together. When you’re cooking a familiar recipe, you’re simultaneously using procedural memory (cooking techniques), semantic memory (knowing what ingredients work together), and episodic memory (remembering the last time you made this dish and what worked or didn’t work).

但真正有趣的是:每次回忆时,你并非像打开电脑文件那样调出精确的记录。相反,你是在重构记忆,每次都可能略有不同。伊丽莎白·洛夫特斯的开创性研究证实了这一点,她发现,仅仅通过提问的方式,就能微妙地改变记忆。<sup> 122</sup>

But here’s what’s really interesting: every time you recall a memory, you’re not pulling up an exact recording like a computer file. Instead, you’re reconstructing it, potentially with slight variations each time. This was demonstrated in groundbreaking research by Elizabeth Loftus, who showed how memories can be subtly altered simply by how questions about them are asked.122

在一项研究中,洛夫特斯向参与者播放了一段车祸视频,然后询问他们“两辆车相撞时的速度是多少?”和“两辆车碰撞时的速度是多少?”。听到“撞碎”这个词的参与者回忆起的速度更高,甚至报告说看到了碎玻璃——而实际上并没有。这项引人入胜的实验凸显了词语选择如何重塑记忆。它表明,人类的记忆是重构,而非记录。

In one study, Loftus showed participants a video of a car accident and then asked, “How fast were the cars going when they smashed into each other?” versus “How fast were the cars going when they hit each other?” Those who heard “smashed” recalled higher speeds and even reported seeing broken glass—when none was present. This fascinating experiment highlights how word choice can reshape memory. It demonstrates that human memories are reconstructions, not recordings.

记忆的这种重构特性,虽然有时并不可靠,却赋予了我们非凡的思维灵活性和问题解决能力。它使我们能够通过重组过去经验的要素来想象新的场景——这种能力由于当前人工智能系统的记忆结构较为僵化,它们在复制方面仍然面临挑战。

This reconstructive nature of memory, while sometimes unreliable, gives us remarkable flexibility in thinking and problem-solving. It allows us to imagine new scenarios by recombining elements of past experiences—a capability that current AI systems, with their more rigid memory structures, still struggle to replicate.

情绪也扮演着至关重要的角色。想想你听到重大世界事件(例如总统选举或全球危机)时身在何处。你很可能记得很清楚,因为人们通常更容易记住带有情绪的体验,而不是中性的体验。这种情绪标记有助于我们优先处理重要信息并做出更明智的决策——我们仍在努力理解这一点,并将其应用于人工智能系统。

Emotions play a crucial role, too. Think about where you were when you heard about a major world event, like a presidential election or a global crisis. You probably remember it clearly because emotional experiences are generally better remembered than neutral ones. This emotional tagging helps us prioritize important information and make better decisions—something we’re still working to understand and implement in AI systems.

理解人类记忆的这些方面有助于解释开发人工智能体所面临的挑战和机遇。虽然我们可以创建能够完美精确地存储海量信息的系统——这是我们的大脑无法做到的——但我们距离复制人类处理和使用记忆的灵活、情境敏感和情感智能的方式还很遥远。

Understanding these aspects of human memory helps explain both the challenges and opportunities in developing AI agents. While we can create systems that store vast amounts of information with perfect accuracy—something our brains can’t do—we’re still far from replicating the flexible, context-sensitive, and emotionally intelligent way humans process and use memories.

这不仅仅是学术知识,它对我们如何设计和使用人工智能系统具有实际意义。当我们认识到人类记忆的本质在于建立有意义的联系,而非存储完美的记录时,我们就能更好地理解人工智能系统的目标:不仅仅是更大的存储空间,而是更智能、更具情境性的信息使用方式。

This isn’t just academic knowledge—it has practical implications for how we design and use AI systems. When we recognize that human memory is more about making meaningful connections than storing perfect records, we can better understand what we should aim for in artificial systems: not just more storage but smarter, more contextual ways of using information.

人工智能记忆的惊人真相

The Surprising Reality of AI Memory

这就引出了与现有AI系统的一个有趣的对比。我们注意到一个常见的误解:许多人害怕与ChatGPT、Gemini或Claude等大型LLM(语言学习模型)互动,因为他们认为这些AI系统会不断学习并记住互动中的所有信息,从而构建一个不断增长的知识库。但事实远比这令人惊讶。

This brings us to a fascinating contrast with current AI systems. We’ve noticed a common misconception: Many people are scared to interact with Large LLMs such as ChatGPT, Gemini, or Claude because they believe these AI systems are constantly learning and remembering everything from their interactions, building an ever-growing knowledge base. The reality is far more surprising.

逻辑逻辑模型(LLM)更像是极其精密的回音室,但临时记忆有限。可以这样理解:当你与人工智能开始对话时,就像打开了一张空白的纸。这是一个页数固定的笔记本。你们讨论的所有内容都会记录在这个笔记本里,人工智能可以随时引用其中的任何部分——但仅限于对话结束前。一旦开始新的聊天,就相当于拿到一个全新的笔记本,之前的内容将不复存在。

LLMs are more like extremely sophisticated echo chambers with limited temporary memory. Think of it this way: When you start a conversation with an AI, it’s like opening a blank notebook with a fixed number of pages. Everything you discuss gets written in this notebook, and the AI can reference any part of it freely—but only until you close the conversation. Once you start a new chat, it’s like getting a fresh notebook with no trace of what was written in the previous one.

想亲自测试一下吗?试试这个简单的实验,可以使用任何人工智能聊天界面,例如 ChatGPT 或 Claude:

Want to test this yourself? Try this simple experiment with any AI chat interface, such as ChatGPT or Claude:

1.问一些你喜欢的话题,例如“在塔希提岛潜水感觉如何?”

1. Ask it about a topic that you like, such as “How is scuba diving in Tahiti?”

2.询问更多关于潜水点和那里可以看到的鱼类的信息。

2. Ask for more information about the diving spots and the types of fish you can see there

3.关闭聊天会话

3. Close the chat session

4.立即打开一个新的聊天窗口,问它:“你一分钟前跟我说的关于塔希提岛的事情是什么?”

4. Open a new chat session immediately and ask it, “What did you tell me about Tahiti one minute ago?”

以下是聊天机器人回复我们的内容:“我不记得最近提到过塔希提岛。您能具体说明一下您想了解什么吗?您是想咨询旅行建议、历史、文化,还是其他方面?”

Here is what the chatbot replied to us: “I don’t recall mentioning Tahiti recently. Could you clarify what you’re looking for? Are you asking about travel tips, history, culture, or something else?”

我们建议您尝试一下。人工智能很可能会承认它什么都不记得了,或者给出一些笼统的回答。这并非人工智能处理能力的缺陷,而是目前大多数人工智能系统处理记忆方式的根本局限。这里的关键教训是,作为目前生产环境中最先进的智能体(3级智能体)基础的低级记忆模块(LLM),其记忆容量竟然和金鱼一样大!

We suggest you try it. Most likely, the AI will either admit it doesn’t remember anything or make generic statements. This isn’t a flaw in the AI’s processing power—it’s a fundamental limitation in how most current AI systems handle memory. The key lesson here is that LLMs, which are the foundations of the most advanced agents currently in production (level 3 agents), have as much memory as a goldfish!

记忆挑战实践

The Memory Challenge in Practice

为了说明这一限制的实际影响,我们不妨考虑一下我们与一家大型医疗服务提供商合作开展的一项实验。他们实施了两个版本的AI排班助手:

To illustrate the practical impact of this limitation, consider an experiment we conducted with a major healthcare provider. They implemented two versions of an AI scheduling assistant:

一个标准的基于LLM的系统,每次交互都从头开始。

A standard LLM-based system that started fresh with each interaction

一种能够记住患者信息的记忆增强型人工智能代理

A memory-augmented AI agent that could retain patient information

结果令人瞩目。标准系统要求患者每次就诊时都要重复一遍病史和偏好。而记忆增强版系统能够记住患者的病史和偏好,从而使预约安排时间缩短了70%,患者满意度提高了45%。

The results were striking. The standard system required patients to repeat their medical history and preferences in every interaction. The memory-augmented version remembered patient histories and preferences, leading to 70% faster scheduling times and a 45% increase in patient satisfaction.

这项实验凸显了生成式人工智能中当前存在的记忆悖论:最先进的人工智能系统可以处理复杂的信息,但往往无法记住有关用户的简单细​​节。

This experiment highlighted the current Memory Paradox in generative AI: The most advanced AI systems can process complex information but often can’t remember simple details about their users.

人工智能代理记忆的三大主要目标

The Three Main Goals of AI Agents’ Memory

通过我们的研究和实践,我们确定了记忆的三个关键目标,这些目标使其成为智能体的基础:

Through our research and implementation, we’ve identified three critical goals of memory that make it the foundation of agentic intelligence:

首先是上下文理解。传统的生成式人工智能系统独立处理每个输入,就像一个人随机翻阅不同书籍的页面一样。而具备记忆功能的人工智能代理则可以在交互过程中保持上下文关联,类似于我们理解连续对话的方式。这种能力对于有意义的交互和问题解决至关重要。

First is contextual understanding. Traditional generative AI systems process each input independently, like a person reading random pages from different books. Memory-enabled AI agents can maintain context across interactions, similar to how we follow a continuous conversation. This capability is crucial for meaningful interactions and problem-solving.

第二点是学习和适应。根据我们的经验,我们观察到最成功的AI应用都能够从过去的交互中学习。例如,一家制造公司的AI质量控制系统不仅能够检测缺陷,而且还记住了问题的模式,随着时间的推移,误报率降低了 40%。

The second is learning and adaptation. Through our experience, we have observed that the most successful AI implementations were those that could learn from past interactions. For example, a manufacturing company’s AI quality control system not only detected defects but also remembered patterns of issues, leading to a 40% reduction in false positives over time.

第三点是规模化个性化。客户越来越期望获得个性化的体验。具备记忆功能的AI代理可以通过保留和学习个人互动经验来实现这一点,同时还能保障隐私和安全。

The third is personalization at scale. Customers increasingly expect personalized experiences. Memory-enabled AI agents can deliver this by retaining and learning from individual interactions while maintaining privacy and security.

人工智能代理记忆的三层结构

The Three Layers of AI Agents’ Memory

与人类记忆的组织方式类似,人工智能代理的记忆可以构建成三个相互关联的层次,每一层都有其特定的功能,包括保存上下文、促进学习和实现随时间推移的适应:

Much like the organization of human memory, the memory of AI agents can be structured into three interconnected layers, each with a specific function in preserving context, facilitating learning, and enabling adaptation over time:

1. 短期记忆(STM)——即时情境保持

1. Short-Term Memory (STM) – Immediate Context Retention

短期记忆(STM)作为人工智能的工作记忆,保存最近的交互信息,确保单次会话中上下文的连贯性。它处理当前输入,跟踪正在进行的对话,并运用注意力机制来优先处理相关信息。然而,短期记忆的容量有限——随着新数据的输入,旧信息会被覆盖,除非它们被转移到长期记忆中。

STM functions as the AI’s working memory, holding recent interactions and ensuring contextual continuity within a single session. It processes current inputs, tracks ongoing conversations, and applies attention mechanisms to prioritize relevant information. However, STM has a limited capacity—as new data comes in, older information is overwritten unless transferred to long-term memory.

2. 长期记忆(LTM)——随时间推移的结构化记忆

2. Long-Term Memory (LTM) – Structured Retention Over Time

长时记忆 (LTM) 超越了基于会话的记忆,它存储结构化信息以供将来参考。这包括用户偏好、过往交互、已学习的工作流程以及特定领域知识。LTM 使人工智能能够识别重复出现的模式、回忆过往交互,并根据积累的经验提供个性化响应。与短时记忆 (STM) 不同,LTM 的设计目的是……坚持不懈,确保人工智能代理在每次交互中不会从头开始。

LTM extends beyond session-based memory by storing structured information for future reference. This includes user preferences, past interactions, learned workflows, and domain-specific knowledge. LTM enables AI to recognize recurring patterns, recall past interactions, and personalize responses based on accumulated experiences. Unlike STM, LTM is designed to persist, ensuring that AI agents do not start from scratch in every interaction.

3. 反馈回路——学习与适应

3. Feedback Loops – Learning and Adaptation

反馈回路是人工智能记忆的自我改进机制,能够随着时间的推移不断完善短期记忆(STM)和长期记忆(LTM)。通过整合用户反馈——无论是显性反馈(例如,更正、评分)还是隐性反馈(例如,互动模式、错误追踪)——人工智能可以调整其记忆结构,从而提高准确性和相关性。这一过程使人工智能能够通过强化有用知识并剔除过时或错误信息而持续改进。

Feedback loops act as the self-improvement mechanism of AI memory, refining both STM and LTM over time. By incorporating user feedback—whether explicit (e.g., corrections, ratings) or implicit (e.g., engagement patterns, error tracking)—the AI adjusts its memory structures to enhance accuracy and relevance. This process allows AI to improve continuously by reinforcing useful knowledge and discarding outdated or incorrect information.

图像

图 7.1:智能体人工智能记忆的三层结构(来源:© Bornet 等人)

Figure 7.1: The Three Layers of Agentic AI Memory (Source: © Bornet et al.)

这些层是如何协同工作的

How These Layers Work Together

AI记忆在这三个层面上动态运作:短期记忆(STM)保存即时上下文,长期记忆(LTM)确保会话结束后信息的连续性,反馈回路则不断完善两者,从而驱动持续学习。这种分层方法使AI代理能够进行情境感知、不断演进和个性化的交互,使其随着时间的推移变得更加智能和可靠。

AI memory functions dynamically across these three layers: STM holds immediate context, LTM ensures continuity beyond a session, and feedback loops refine both to drive continuous learning. This layered approach allows AI agents to engage in context-aware, evolving, and personalized interactions, making them more intelligent and reliable over time.

可以将这个智能体人工智能框架的层级想象成洋葱。短期记忆(STM)构成最外层,为交互提供即时上下文,但它是暂时的,会随着新信息的到来而迅速消退。在其下方,长期记忆(LTM)作为基础,存储关键知识,确保人工智能不会在每次交互中都从零开始。核心部分,反馈回路不断完善和强化短期记忆和长期记忆,使系统能够从用户交互中学习并随着时间的推移而不断改进。正如洋葱通过层层叠加而生长一样,这种记忆架构使人工智能体能够发展出更深层次的智能,确保其响应的连续性、适应性和长期可靠性。

Think of the layers of this agentic AI framework like an onion. Short-term memory (STM) forms the outermost layer, providing immediate context for interactions, but it is temporary and fades quickly as new information arrives. Beneath it, long-term memory (LTM) serves as the foundation, storing critical knowledge and ensuring that AI does not restart from zero in every interaction. At the core, feedback loops continuously refine and strengthen both STM and LTM, enabling the system to learn from user interactions and improve over time. Just as an onion grows by building upon its layers, this memory architecture allows AI agents to develop deeper intelligence, ensuring continuity, adaptability, and long-term reliability in their responses.

在接下来的章节中,我们将深入探讨人工智能代理中不同类型的记忆是如何运作的,它们的实际应用以及我们必须解决的局限性。但请记住这个基本事实:正如人类智能建立在我们记忆和从经验中学习的能力之上一样,人工智能代理的未来也取决于它们是否具备这种能力。

In the following sections, we’ll delve deeper into how different types of memory work in AI agents, their practical applications, and the limitations we must address. But remember this fundamental truth: just as human intelligence is built on our ability to remember and learn from experiences, the future of AI agents depends on their ability to do the same.

人工智能代理中短期记忆的复杂运作

The Intricate Dance of Short-Term Memory in AI Agents

想象一下你身处新加坡繁忙的十字路口。你同时处理着多条信息:交通信号灯的变化、行人过马路、其他车辆行驶,以及导航系统提供的路线指引。这种持续不断的即时信息流以及你如何处理这些信息,至关重要。它阐明了短期记忆的本质——短期记忆不仅是人类认知的重要组成部分,也是人工智能体的重要组成部分。

Picture yourself at a busy intersection in Singapore. You’re processing multiple streams of information simultaneously: traffic lights changing, pedestrians crossing, other vehicles moving, and your navigation system providing directions. This constant flow of immediate information and how you process it illustrates the essence of short-term memory—a critical component not just for human cognition but for AI agents as well.

当下挑战

The Challenge of the Present Moment

正如我们的大脑必须不断处理大量涌入的信息并保持对上下文的理解一样,人工智能体也面临着类似的挑战。然而,它们处理这种即时信息的方式既引人入胜,又与人类认知有着本质的不同。根据我们的经验,我们发现,对于任何希望在人工智能技术领域取得成功的人来说,理解这些差异至关重要。

Just as our brains must constantly juggle incoming information while maintaining immediate context, AI agents face a similar challenge. However, their approach to handling this immediate information processing is both fascinating and fundamentally different from human cognition. Through our experience, we’ve discovered that understanding these differences is crucial for anyone willing to succeed with AI technology.

让我们先来看一个简单的实验,这个实验是我们咨询工作中经常使用的,用来证明短期记忆的重要性:

Let’s start with a simple experiment that we often use in our consulting work to demonstrate the importance of short-term memory:

与 Claude 或 Gemini 等 AI 助手展开对话,开始讲述你一天中的故事。每说几句话后,就让 AI 总结一下你之前说过的内容。

Open a conversation with an AI assistant like Claude or Gemini and start telling a story about your day. After every few sentences, ask the AI to summarize what you’ve said so far.

你会注意到一个有趣的现象:人工智能可以暂时记住上下文,但最终会开始遗忘早期的细节。这与人类的短期记忆颇为相似,认知科学家告诉我们,人类的短期记忆通常一次只能记住大约7个(正负2个)项目。123

You’ll notice something interesting: the AI can maintain context for a while, but eventually, it starts losing track of earlier details. This isn’t unlike human short-term memory, which cognitive scientists tell us can typically hold about 7 (plus or minus 2) items at once.123

上下文窗口:人工智能的工作记忆

The Context Window: AI’s Working Memory

许多客户最惊讶的发现之一是人工智能系统如何处理即时信息。目前的人工智能系统,特别是逻辑逻辑模型(LLM),是在我们称之为“上下文窗口”的范围内运行的——可以将其理解为人工智能可以查看和处理信息的临时工作空间。这个窗口就像……只能容纳有限数量文字的白板,一旦填满,就必须擦掉旧信息才能腾出空间输入新内容。

One of the most surprising discoveries for many of our clients is how AI systems actually handle immediate information. Current AI systems, particularly LLMs, operate within what we call a “context window” - think of it as a temporary workspace where the AI can see and process information. This window is like a whiteboard that can hold a limited amount of text, and once it’s full, older information must be erased to make room for new input.

李、邵如林及其同事的最新研究揭示了这些上下文窗口的一些有趣之处。他们的研究发现了他们所谓的“上下文上限”——即向智能体记忆中添加更多信息开始降低而非提升其性能的临界点。<sup> 124</sup>

Recent work by Li, Rulin Shao, and their colleagues revealed something fascinating about these context windows. Their research uncovered what they call the “context ceiling”—the point at which adding more information to an agent’s memory starts degrading rather than enhancing performance.124

这种现象并非仅仅关乎达到最大容量;它揭示了智能体在处理和整合大量信息方面所面临的根本性挑战。研究人员发现,性能下降往往在达到标称内存极限之前就已经开始,这表明有效的内存管理不仅仅关乎存储容量,更关乎智能体如何处理其短期记忆中的信息。

This phenomenon isn’t simply about reaching maximum capacity; rather, it reveals fundamental challenges in how agents process and integrate large amounts of information. The researchers found that performance degradation often begins well before reaching nominal memory limits, suggesting that effective memory management isn’t just about storage capacity but about how an agent processes information within its short-term memory.

为了理解这一点,我们将短期记忆分解为三个不同的组成部分:上下文窗口、注意力机制和令牌管理。这三个要素构成了人工智能代理处理任务、确定信息优先级和做出响应的基础。通过一个简单的实验来探索这些组成部分,我们不仅可以揭示它们的工作原理,还可以了解如何更有效地引导代理——以及这对商业环境中的领导者意味着什么。

To understand this, let’s break down short-term memory into three distinct components: the context window, attention mechanisms, and token management. These three elements form the backbone of how an AI agent engages with tasks, prioritizes information, and delivers responses. By exploring these components through a simple experiment, we can uncover not only how they work but also how to guide the agent more effectively—and what this means for leaders in business contexts.

上下文窗口

The Context Window

想象一下,你遇到一份晦涩难懂的技术报告——也许是关于量子计算的白皮书,或者是一份关于可再生能源的政策文件。你指示生成式人工智能聊天机器人:“用五个要点概括这篇文章,重点关注……”主要主题或发现。”在处理文本的过程中,上下文窗口成为第一个也是最关键的组成部分。

Imagine you’ve come across a dense technical report—perhaps a whitepaper on quantum computing or a policy document on renewable energy. You instruct the generative AI Chatbot: “Summarize this article in five bullet points, focusing on the main themes or findings.” As it processes the text, the context window becomes the first and most critical component at play.

可以将上下文窗口想象成人工智能的办公桌——一个用来整理完成任务所需信息的工作空间。在像 GPT 这样的模型中,这张办公桌最多可以容纳几十万个词元,也就是大约 10 万个单词;而像谷歌 Gemini 这样的尖端系统则拥有更大的容量,可以容纳数百万个词元。

Think of the context window as the AI’s desk—a workspace where it arranges all the information it needs to complete the task. In a model like GPT, this desk can accommodate up to a few hundred thousand tokens, or roughly 100,000 words, while cutting-edge systems like Google’s Gemini offer even larger capacities with millions of tokens.

但即使是巨大的工作台也有其局限性。人工智能必须决定如何安排信息,以确保其易于管理。如果文档超出工作台的尺寸,某些部分将被完全省略,而其他部分则会被压缩或简化。人工智能处理任务的能力取决于它如何有效地利用这有限的空间。125

But even a massive desk has its limits. The AI must decide how to arrange the information to ensure it stays manageable. If the document exceeds the desk’s size, some parts are left out entirely, while others are compressed or simplified. The AI’s ability to process the task hinges on how effectively it uses this finite space.125

注意力机制

The Attention Mechanisms

一旦信息呈现在桌面上,注意力机制便开始发挥作用。人工智能能够专注于输入信息中最相关的部分。想象一下,你用荧光笔扫描文档,标记出最重要的句子或观点。注意力机制会动态地完成这项工作,根据你的提示为不同的信息片段赋予权重。在这种情况下,人工智能会聚焦于诸如量子纠缠的作用或可再生能源政策的全球影响等宏观主题,而过滤掉不太重要的细节,例如具体的例子或次要的论点。

Once the information is on the desk, the attention mechanisms take over. This is the AI’s ability to focus on the most relevant parts of the input. Imagine scanning the document with a highlighter, marking the sentences or ideas that matter most. Attention mechanisms do this dynamically, assigning weights to different pieces of information based on your prompt. In this case, the AI zeroes in on overarching themes like the role of quantum entanglement or the global impact of renewable energy policies, filtering out less significant details like specific examples or minor arguments.

注意力机制非常强大,但它们会受到你提供的指令的影响。如果你要求的是“主要主题”,那么人工智能的注意力自然会降低细粒度信息的优先级。具体细节至关重要。因此,务必发送清晰的指示,并确保其与目标高度一致。126

Attention mechanisms are incredibly powerful, but they’re shaped by the instructions you provide. If you ask for “main themes,” the AI’s focus will naturally deprioritize granular specifics. Hence, it is important to send clear instructions and make sure they align well with the goal.126

令牌管理

Token Management

最后,令牌管理就像人工智能的笔记系统。想象一下,你在听讲座,试图抓住要点,但又不想把所有内容都记下来。你会专注于核心内容,跳过那些重复或不太重要的部分。人工智能在接近内存上限时也会这样做——它会进行总结、提炼,并决定哪些信息应该保留在它的“工作空间”中。但这种方法也有其不足之处。一些细微的细节或例子可能会被忽略,因为它们在当时看来并不那么重要。令牌管理是一种平衡,它既要确保人工智能保持简洁和专注,又要保留足够的细节以有效地完成其任务。

Finally, token management is like the AI’s note-taking system. Imagine you’re in a lecture, trying to capture the key points without writing everything down. You focus on the essentials and skip what feels repetitive or less important. AI does the same as it nears its memory limit—it summarizes, condenses, and decides what to keep in its mental workspace. But this approach has trade-offs. Some nuanced details or examples might get left out because they seemed less critical at the moment. Token management is a balancing act, ensuring the AI stays concise and focused while preserving enough detail to fulfill its purpose effectively.

为了进一步测试,您可以向生成式人工智能聊天机器人提出一个问题:“文章中关于纠缠在纠错中的作用是如何描述的?”如果这个细节在文章的主要主题中有所突出,人工智能就能快速准确地检索出来。但如果它被隐藏在不太显眼的章节中,或者缺乏上下文强调,人工智能就可能出错。它可能会尝试根据相关主题推断答案,或者如果推断失败,则会返回一个更宽泛、更不精确的回答。这是因为在注意力阶段,这个具体细节没有被优先考虑,或者在词元管理过程中丢失了。

To test this further, you follow up with a question to the generative AI Chatbot: “What does the article say about the role of entanglement in error correction?” If this detail was highlighted in the main themes, the AI retrieves it quickly and accurately. But if it was buried in a less prominent section or lacked contextual emphasis, the AI might stumble. It might attempt to infer an answer based on related themes or, failing that, return a broader, less precise response. This happens because the specific detail wasn’t prioritized during the attention phase or was lost during token management.

该实验揭示了人工智能的记忆架构是如何运作的:上下文窗口定义了它的界限,注意力机制塑造了它的关注点,而标记管理则强制执行权衡取舍。

The experiment reveals the AI’s memory architecture in action: the context window defines its limits, attention mechanisms shape its focus, and token management enforces trade-offs.

现实世界的影响

The Real-World Impact

为了了解短期记忆的各个组成部分——上下文窗口、注意力机制和令牌管理——如何在真实场景中协同运作,我们可以考虑一家大型金融服务提供商开展的一个项目。该项目的任务是开发一个能够处理客户关于投资产品的复杂咨询的人工智能代理。挑战相当巨大。客户通常需要对投资选项进行详细解释,期望系统能够在冗长的对话中保持上下文关联,并且经常提及多个账户或特定的历史交易。人工智能不仅需要跟踪所有这些信息,还需要在同一对话中进行交叉引用,以提供准确且个性化的回复。

To see how the components of short-term memory—context windows, attention mechanisms, and token management—come together in a real-world scenario, consider a project undertaken by a major financial services provider. The task was to develop an AI agent capable of handling intricate customer inquiries about investment products. The challenges were considerable. Customers often required detailed explanations about investment options, expected the system to maintain context across lengthy conversations, and frequently referenced multiple accounts or specific historical transactions. The AI needed to not only keep track of all this information but also cross-reference it within the same conversation to provide accurate and personalized responses.

最初,该系统仅具备基本的短期记忆能力。它擅长回答简单的任务,例如解释两种共同基金之间的区别或查询账户余额。然而,当面对多步骤流程时,它就显得力不从心。例如,如果客户要求比较投资选项、更新账户偏好设置并根据假设情景计算潜在收益,人工智能常常会忘记对话的先前内容。这会导致回答支离破碎、重复或错误,尤其是在处理多个账户或转换复杂的金融概念时。

Initially, the system was equipped with basic short-term memory capabilities. It excelled at straightforward tasks like answering single-step questions—such as explaining the differences between two mutual funds or retrieving account balances. However, it faltered when confronted with multi-step processes. For example, if a customer asked to compare investment options, update their account preferences, and calculate potential returns based on a hypothetical scenario, the AI often lost track of earlier parts of the conversation. This led to fragmented responses, repetition, or errors, especially when juggling multiple accounts or shifting between complex financial concepts.

为了解决这些不足,公司内部的人工智能开发团队实施了增强型短期记忆策略。首先,他们通过将对话分割成不同的片段来优化上下文窗口,确保人工智能每次能够专注于讨论的一部分,而不会忽略之前的细节。其次,他们微调了注意力机制,优先处理输入中最相关的部分,例如关键的客户请求、账户详情和重要的财务条款。最后,他们引入了令牌管理。总结和保留对话早期要点,同时丢弃无关或冗余信息的协议。

To address these shortcomings, the company’s internal AI development team implemented enhanced short-term memory strategies. First, they optimized the context window by splitting conversations into distinct segments, ensuring the AI could focus on one part of the discussion at a time without losing sight of previous details. Next, they fine-tuned the attention mechanisms to prioritize the most relevant parts of the input—such as key customer requests, account-specific details, and critical financial terms. Finally, they introduced token management protocols to summarize and retain essential points from earlier in the conversation while discarding irrelevant or redundant information.

这些改进带来了变革性的结果。经过这些增强,人工智能能够以惊人的准确度处理复杂的客户互动。它能够无缝地完成多步骤流程,例如比较投资组合表现、提供量身定制的投资建议以及执行账户更新——所有这些都可以在一次会话中完成。错误率下降了65%,客户满意度评分显著提高,这反映出该系统能够进行细致且连贯的财务讨论。虽然人工智能在细致程度和适应性方面仍不及人类顾问,但这一改进标志着人工智能在处理复杂财务互动方面取得了重大飞跃,其精准度和可靠性都得到了显著提升。

The results were transformative. With these enhancements, the AI was able to handle intricate customer interactions with remarkable accuracy. It seamlessly navigated multi-step processes, such as comparing portfolio performance, providing tailored investment advice, and executing account updates—all within a single session. Error rates dropped by 65%, and customer satisfaction scores rose significantly, reflecting the system’s ability to engage in detailed and coherent financial discussions. While still not as nuanced or adaptable as a human advisor, this improvement marked a significant leap in AI’s ability to manage complex financial interactions with greater precision and reliability.

这个例子凸显了精心设计的短期记忆管理在人工智能系统中的巨大潜力。它不仅关乎性能提升,更关乎信任建立。当客户看到人工智能能够记住他们的需求、适应复杂的要求并提供准确的信息时,他们就会对其能力充满信心。对于企业而言,这不仅仅是运营效率的提升,更是一项能够将客户体验提升到全新高度的竞争优势。

This example underscores the potential of well-designed short-term memory management in AI systems. It’s not just about enhancing performance—it’s about building trust. When customers see that an AI can remember their needs, adapt to complex demands, and deliver accurate information, they feel confident in its capabilities. For businesses, this isn’t just an operational improvement; it’s a competitive advantage that elevates the customer experience to new heights.

引导短期记忆以获得更好的结果

Guiding the Short-Term Memory for Better Outcomes

要充分发挥人工智能代理的作用,用户必须积极引导其记忆过程。这首先要精心设计与当前任务相符的提示语。如果你的目标是提取大的主题,那么像“总结要点”这样简洁明了的指令就非常有效。另一方面,如果你需要的是具体的细节,那么你的指令就应该更加精确:“请详细解释第三部分,并重点举例说明。”

To get the most from an AI agent, users must take an active role in guiding its memory processes. This starts with crafting thoughtful prompts that align with the task at hand. If your goal is to extract broad themes, straightforward instruction like “Summarize the key points” works well. On the other hand, if you’re looking for specific details, your guidance should be more precise: “Provide a detailed explanation of Section 3, focusing on examples.”

领导者必须意识到,人工智能与人类团队成员不同,它不会凭直觉进行优先级排序或推理。相反,当任务清晰明了、结构化呈现时,它才能发挥最佳效用,最大限度地提高其专注力和处理能力。例如,在客户服务应用中,与其向人工智能展示客户的完整聊天记录,不如提供关键互动的简洁摘要。同样,在战略规划中,将市场分析分解成更小、更聚焦的部分,可以让人工智能迭代地处理数据,从而随着时间的推移产生更深入的洞察。

Leaders must realize that AI, unlike a human team member, doesn’t intuitively prioritize or infer. Instead, it thrives when tasks are presented clearly and structured to maximize its focus and processing capabilities. For example, in customer service applications, rather than presenting the AI with a customer’s entire chat history, a better approach is to provide concise summaries of key interactions. Similarly, in strategic planning, breaking down market analysis into smaller, focused sections enables the AI to engage with the data iteratively, producing deeper insights over time.

除了工作流程之外,这种理解还要求我们重新思考团队动态。人工智能代理擅长管理和分析海量数据,但它们仍然需要人类监督来指导其工作优先级。这为混合协作创造了机会——人类负责输入和解读输出​​,而人工智能则负责处理重复性、数据密集型任务。两者相辅相成,形成互补的伙伴关系。通过促进这种协同作用,企业领导者可以释放更高水平的效率和创新能力,同时最大限度地降低人工智能固有局限性带来的风险。

Beyond workflows, this understanding demands a rethinking of team dynamics. AI agents excel at managing and analyzing vast quantities of data, but they still need human oversight to guide their priorities. This creates an opportunity for hybrid collaboration—humans shape the inputs and interpret the outputs, while AI tackles repetitive, data-intensive tasks. Together, they form a complementary partnership. By fostering this synergy, business leaders can unlock new levels of efficiency and innovation while minimizing the risks associated with AI’s inherent limitations.

这种协作方式催生了一些实用策略。其中最有效的策略之一是信息分块。正如人类将知识组织成易于管理的组块一样,人工智能代理在将输入信息结构化为易于理解的片段时也能表现得更好。例如,在客户服务系统中实施信息分块后,响应准确率提高了 40%。按主题或相关性分组的信息使人工智能能够更高效地利用其短期记忆,从而给出更清晰、更精确的答案。

Practical strategies have emerged from this collaborative approach. One of the most effective is chunking information. Just as humans organize knowledge into manageable groups, AI agents perform better when their inputs are structured into digestible segments. For instance, implementing chunking in a customer service system led to a 40% improvement in response accuracy. Information grouped by topic or relevance enabled the AI to navigate its short-term memory more efficiently, resulting in sharper, more precise answers.

另一种强大的技术是优先级队列,它确保最关键的信息始终保存在人工智能的短期记忆中。例如,在医疗保健应用中,患者的症状和生命体征优先于行政细节。这种策略意味着关键的医疗信息可以随时获取,从而增强了人工智能的能力。旨在协助医疗服务提供者应对时间紧迫的情况。这些方法的成功表明,精心构建信息结构可以显著提升各行业的AI性能。

Another powerful technique is priority queuing, which ensures the most critical information stays accessible in the AI’s short-term memory. In a healthcare application, for example, patient symptoms and vital signs were prioritized over administrative details. This strategy meant that critical medical information was readily available, enhancing the AI’s ability to assist healthcare providers in time-sensitive scenarios. The success of these approaches demonstrates how the deliberate structuring of information can significantly improve AI performance across industries.

最终,通过理解和掌握短期记忆的机制——上下文窗口、注意力机制和令牌管理——你可以确保人工智能不仅成为一种工具,而且成为增长和创新的催化剂。

Ultimately, by understanding and mastering the mechanics of short-term memory—context windows, attention mechanisms, and token management—you can ensure that AI becomes not just a tool but a catalyst for growth and innovation.

展望短期记忆的未来

Looking to the Future of Short-Term Memory

人工智能代理的短期记忆未来发展前景令人振奋,其突破性进展有望彻底改变机器处理和记忆信息的方式。其中,三项最新进展尤为突出,每一项都展现了人工智能在更高效地处理复杂任务方面的潜力。

The future of short-term memory in AI agents holds exciting possibilities, with breakthroughs that promise to revolutionize how machines process and retain information. Among these, three recent advancements stand out, each offering a glimpse into the potential of AI to handle more complex tasks with greater efficiency.

其中一项创新是地标注意力机制(Landmark Attention)<sup> 127</sup>,它为人工智能提供了一种类似心理地图的机制,使其能够驾驭海量信息。试想一下,阅读一本长篇小说或分析一份综合报告——地标注意力机制使人工智能能够将这些庞大的文本分割成易于管理的部分,并识别出关键点或“地标”以便集中关注。这种方法确保人工智能能够快速获取相关细节,而不会被海量信息所淹没。它解决了当前系统在处理长序列信息时面临的一个关键限制,使人工智能能够一次性处理整本书或数据集。对于企业而言,这意味着可以更快、更全面地分析客户反馈、法律文件或财务数据。

One such innovation is Landmark Attention,127 a mechanism that equips AI with a sort of mental map to navigate vast amounts of information. Imagine reading a lengthy novel or analyzing a comprehensive report—Landmark Attention enables AI to divide these massive texts into manageable sections and identify key points, or “landmarks,” to focus on. This approach ensures the AI can access relevant details quickly without becoming overwhelmed by the sheer volume of information. It addresses a critical limitation in current systems that struggle with long sequences, making it possible for AI to process entire books or datasets in one go. For businesses, this could mean faster and more comprehensive analysis of customer feedback, legal documents, or financial data.

另一项颠覆性的研究名为可变尺寸窗口注意力(VSA)。<sup>128</sup>它使人工智能能够动态调整其关注点。想象一下一副可调节的眼镜,让你在需要时放大观察细节,在需要时缩小视野以把握整体背景。VSA 为人工智能提供了这种灵活性,根据任务需求创建不同大小的“窗口”。例如,总结一份冗长的报告可能需要一个宽广的视角来捕捉总体主题,而翻译一个句子则需要更聚焦的视角。这种适应性使人工智能更加通用,能够精准高效地应对各种挑战。在实际应用中,这意味着人工智能可以无缝地在撰写文章和分析法律合同之间切换,并以卓越的技能完成这两项任务。

Another game-changing research is called Varied-Size Window Attention (VSA).128 It allows AI to adapt its focus dynamically. Picture an adjustable pair of glasses that lets you zoom in on fine details when needed and zoom out to take in the broader context. VSA provides AI with this flexibility, creating “windows” of varying sizes depending on the task requirements. For example, summarizing a lengthy report may require a wide view to capture overarching themes, while translating a sentence demands a more focused lens. This adaptability makes AI far more versatile and capable of handling diverse challenges with precision and efficiency. In practical terms, this means AI could seamlessly transition between drafting an article and analyzing a legal contract, performing both tasks with remarkable skill.

第三项研究名为 RetrievalAttention,<sup> 129</sup>在内存管理方面引入了新的效率水平。当前的人工智能系统通常试图处理和存储所有信息,甚至包括无关信息,这会导致资源浪费和性能下降。RetrievalAttention 通过教会人工智能在需要时仅检索最重要的信息来改变这一现状,就像人无需记住每个字就能回忆起谈话的要点一样。这种方法显著加快了处理速度,同时降低了能耗,使人工智能不仅速度更快,而且更具成本效益。其意义深远:试想一下,更流畅、更快速的人工智能助手能够处理复杂的查询而不会减速,并且能够在日常设备上高效运行。

The third research, called RetrievalAttention,129 introduces a new level of efficiency in managing memory. Current AI systems often attempt to process and store everything, even irrelevant information, leading to wasted resources and slower performance. RetrievalAttention changes this by teaching AI to retrieve only the most important pieces of information when needed, much like a person recalling the key points of a conversation without needing to remember every word. This method significantly speeds up processing times while reducing energy consumption, making AI not only faster but also more cost-effective. The implications are profound: imagine smoother, quicker AI assistants that can tackle complex queries without slowing down, all while running efficiently on everyday devices.

这些进展共同代表了人工智能在处理上下文和记忆方面的一次飞跃。它们相辅相成,完美地解决了同一根本挑战的不同方面。Landmark Attention 在组织方面表现卓越。海量信息;VSA 使 AI 更能适应不同的任务;而 RetrievalAttention 则确保内存得到高效管理。

Together, these advancements represent a leap forward in how AI handles context and memory. They complement each other beautifully, addressing different aspects of the same fundamental challenge. Landmark Attention excels at organizing vast information; VSA makes AI more adaptable to varying tasks; and RetrievalAttention ensures that memory is managed efficiently.

理解和优化人工智能代理的短期记忆不仅仅是一项技术挑战,它对于创建能够与人类进行有效互动并高效处理复杂任务的系统至关重要。虽然短期记忆对于即时任务执行至关重要,但只有将其与长期记忆系统整合,才能真正发挥其潜力。这种整合使人工智能代理不仅能够高效处理即时信息,还能随着时间的推移不断学习和适应——我们将在下一节深入探讨这一主题。

Understanding and optimizing short-term memory in AI agents isn’t just a technical challenge—it’s fundamental to creating systems that can engage meaningfully with humans and handle complex tasks effectively. While short-term memory is crucial for immediate task performance, its real potential is realized when integrated with long-term memory systems. This integration allows AI agents to not just process immediate information effectively but also learn and adapt over time—a topic we’ll explore in depth in the next section.

长期记忆的力量:将人工智能从工具转变为合作伙伴

The Power of Long-Term Memory: Transforming AI from Tools to Partners

长期记忆的力量

The Power of Long-Term Memory

想象一下:一个人工智能助手不仅能帮你完成任务、回答问题,还能记住你所做的一切。不是一天、一周,甚至一年——而是几十年。它记住每一个重要的时刻、每一次挑战、每一次成功、每一次教训。它不只是存储数据,它更理解你的故事。

Imagine this: an AI agent that doesn’t just help you with tasks or answer your questions but remembers. Not for a day, or a week, or even a year—but across decades. It remembers every important moment, every challenge, every success, every lesson you’ve learned. It doesn’t just store data; it understands your story.

有了这样的教练或伙伴,你能取得怎样的成就?

What could you achieve with a coach or a companion like that?

想想过去一年你做出的商业决策。有些决策简单明了,却影响深远——比如为了赶上截止日期而重新分配资源,或者批准关键供应链组件的新供应商。另一些决策则可能具有变革性——比如重大的产品转型、进军新市场或重组部门。这些决策中有多少是基于公司历史数据和实际情况做出的?又有多少次,你仅仅因为相关细节难以获取,就只能依靠直觉或不完整的记忆?

Think about the business decisions you’ve made in the past year. Some were straightforward but impactful—like reallocating resources to meet a deadline or approving a new vendor for a critical supply chain component. Others may have been transformative—a major product pivot, entering a new market, or restructuring a department. How many of those decisions were fully informed by your company’s historical data and context? How often did you rely on instincts or incomplete recollections simply because the relevant details weren’t readily available?

现在,想象一下,一个拥有所有记忆的人工智能代理。当面临战略选择时,它会根据历史模式提供洞见。“还记得三年前,你犹豫是否要在类似市场推出那款产品吗?结果,由于你采取了精准的策略,它取得了巨大的成功。这次的新机遇也具有类似的特征——以下是上次行之有效的方法,以及你可以考虑的其他建议。” 这并非假设,而是切实可行的,它基于你组织的独特发展轨迹。

Now, imagine an AI agent that remembers it all. When faced with a strategic choice, it provides insights rooted in historical patterns. “Remember three years ago, when you hesitated to launch that product in a similar market? It turned out to be a huge success because of your targeted approach. This new opportunity shares similar characteristics—here’s what worked last time and what you might consider.” This isn’t hypothetical; it’s actionable, grounded in your organization’s unique trajectory.

这种记忆的力量远不止于被动决策。它能识别业务战略中的模式,凸显运营效率低下之处,并放大组织的优势。它不仅存储数据,还能赋予数据意义,将原始信息转化为可执行的洞察。这正是工具与真正业务伙伴之间的区别——真正的业务伙伴会与您共同成长,了解您的优先事项,并帮助您取得更明智、更自信的成果。

The power of such memory extends beyond reactive decision-making. It identifies patterns in your business strategies, highlights operational inefficiencies, and amplifies your organization’s strengths. It doesn’t just store data—it contextualizes it, turning raw information into actionable insights. This is the difference between a tool and a true business partner—one that evolves with you, understands your priorities, and helps you achieve smarter, more confident outcomes.

这就是人工智能的未来。它不仅更智能,而且在记忆、理解能力方面也更加人性化,能够帮助我们更有目标地生活。在未来,记忆不仅会被保存,更会被用来激发我们最好的自我。

This is the future of AI. Not just smarter, but profoundly more human in its ability to remember, understand, and help us live our lives with greater intention. It’s a future where memory isn’t just preserved—it’s used to unlock our best selves.

为什么长期记忆(LTM)对企业很重要?

Why is Long-Term Memory (LTM) Important for Businesses

LTM(长期记忆)在人工智能代理中的商业影响是变革性的,它重塑了企业与客户互动的方式,简化了运营流程,并带来了战略优势。试想一下,如果每一次客户服务互动都建立在前一次的基础上,那将会是怎样一番景象:人工智能能够记住您的偏好,预测您的需求,并在无需您重复描述的情况下解决问题。这种程度的个性化不仅能提升客户满意度,更能建立客户忠诚度。根据我们的经验,采用记忆型人工智能代理的企业报告称,客户满意度提升了 20% 至 30%,这得益于流畅、个性化的互动,这种互动更贴近人性,而非冷冰冰的交易。

The business impact of LTM in AI agents is transformative, reshaping how organizations interact with customers, streamline operations, and gain strategic advantages. Imagine a customer service experience where every interaction builds upon the last: an AI that recalls your preferences, anticipates your needs, and resolves issues without making you repeat yourself. This level of personalization doesn’t just improve satisfaction—it creates loyalty. According to our experience, companies leveraging memory-enabled AI agents have reported customer satisfaction increases of 20-30%, driven by seamless, tailored interactions that feel more human than transactional.

从运营层面来看,其优势同样显著。以一家物流公司为例,该公司需要应对供应链中断。借助具备记忆功能的AI,可以保留并实时应用以往挑战的模式,例如天气延误或区域瓶颈,从而更快地解决问题并更智能地分配资源。

Operationally, the benefits are just as profound. Consider a logistics firm managing supply chain disruptions. With memory-enabled AI, patterns from past challenges—like weather delays or regional bottlenecks—are retained and applied in real time, enabling faster resolutions and smarter resource allocation.

同样,借助人工智能代理进行新员工入职培训,培训时间也会缩短,因为这些系统能够记住组织架构的细微差别并提供一致的指导。根据我们的经验,在人工智能系统中采用长期记忆功能的企业,其错误率最多可降低 50%,从而减少低效环节并提升绩效。

Similarly, employees onboarding with the help of AI agents experience reduced training times as these systems remember organizational nuances and provide consistent guidance. According to our experience, businesses adopting long-term memory in their AI systems have seen error rates fall by as much as 50%, cutting inefficiencies and enhancing performance.

其战略优势更加引人注目。具备长期记忆的人工智能代理擅长识别随时间推移而出现的模式,并将原始数据转化为可执行的洞察。例如,金融机构可以利用此类代理来追踪客户行为的细微变化,从而在潜在风险升级之前识别它们。

The strategic advantages are even more compelling. AI agents with long-term memory excel at recognizing patterns over time, transforming raw data into actionable insights. For instance, a financial institution might leverage such agents to track subtle shifts in client behavior, identifying emerging risks before they escalate.

决策更加果断,预测更加精准,风险管理也更加积极主动。这些并非渐进式的改进,而是飞跃式的进步,能够为企业带来显著的竞争优势。

Decision-making becomes sharper, predictions more accurate, and risk management far more proactive. These are not incremental improvements—they’re leaps that give businesses a significant competitive edge.

通过使人工智能能够与组织一起学习、适应和成长,长期记忆将其从一种工具转变为战略合作伙伴,从而在企业的各个层面释放价值。

By enabling AI to learn, adapt, and grow alongside an organization, long-term memory transforms it from a tool into a strategic partner, unlocking value at every level of the enterprise.

当前LLM模型在长期记忆方面存在不足

Current LLMs Fall Short of Long-Term Memory

正如我们之前的实验所示,当我们询问有关塔希提岛潜水的信息时,像 ChatGPT 这样的现有语言学习模型缺乏长期记忆(LTM)。这种限制并非偶然——而是为了优化效率、保护隐私以及避免模型被过多数据淹没而有意做出的权衡。

As shown in our earlier experiment, when we asked about scuba diving in Tahiti, current LLMs like ChatGPT lack LTM. This limitation isn’t accidental—it’s a deliberate trade-off to optimize efficiency, protect privacy, and avoid overwhelming the model with excessive data.

为了解决这个问题,构建人工智能代理需要赋予逻辑逻辑模型(LLM)访问外部存储的能力。在接下来的章节中,我们将探讨这些能力,它们代表着迈向内存赋能人工智能的重要一步。然而,每它自身也存在局限性和权衡取舍。它们共同构成了当今创建更智能、更具适应性的人工智能代理的基础。

To address this, building AI agents requires giving LLMs access to external memory capabilities. In the following sections, we’ll explore these capabilities, which represent significant steps toward memory-enabled AI. However, each comes with its own limitations and trade-offs. Together, they form the foundation of today’s efforts to create smarter, more adaptable AI agents.

在智能体人工智能系统中设计和实现长期记忆

Designing and Implementing Long-Term Memory in Agentic AI Systems

在接下来的篇幅中,我们将详细介绍我们久经考验的内存系统设计方法、我们发现最有效的框架以及我们总结出的需要避免的陷阱。无论您是增强现有的人工智能代理,还是从零开始构建新的人工智能代理,本指南都将帮助您实现真正有价值的内存架构。

In the pages that follow, we’ll walk through our battle-tested approach to memory system design, the frameworks we’ve found most effective, and the pitfalls we’ve learned to avoid. Whether you’re enhancing existing AI agents or building new ones from the ground up, this guide will help you implement memory architectures that deliver real value.

理解记忆图景

Understanding the Memory Landscape

在深入实施之前,我们需要先明确人工智能体中“记忆”的含义。根据我们的经验,成功的记忆系统需要多种互补的记忆类型协同工作。

Before diving into implementation, we need to establish a clear understanding of what memory means in the context of AI agents. In our experience, successful memory systems require multiple, complementary types of memory working in concert.

在人工智能系统中,我们所说的“记忆”指的是一种结构化的机制,它允许智能体存储、检索和利用信息。与大多数语言模型所采用的简单上下文窗口不同,真正的人工智能记忆能够实现跨会话的持久性,并使其能够从过去的交互中学习。

When we talk about memory in AI systems, we’re referring to structured mechanisms that allow agents to store, retrieve, and utilize information over time. Unlike the simple context windows that characterize most language models, true AI memory creates persistence across sessions and enables learning from past interactions.

人工智能记忆通常分为三大类:

AI memory generally falls into three primary categories:

情景记忆代表了智能体的经验知识——发生了什么事,何时发生的,以及与谁发生的。这包括对话历史、用户随时间表达的偏好、过去采取的行动以及观察到的结果。情景记忆使智能体能够在交互过程中保持连续性,即使间隔数天或数周。例如,当一位财务顾问回忆起客户之前曾表示对低风险投资感兴趣时,这就是情景记忆在发挥作用。

Episodic Memory represents the agent’s experiential knowledge—what happened, when, and with whom. This includes conversation histories, user preferences expressed over time, past actions taken, and outcomes observed. Episodic memory allows our agents to maintain continuity across interactions, even when separated by days or weeks. When a financial advisory agent recalls that a client previously expressed interest in low-risk investments, that’s episodic memory at work.

语义记忆包含事实性知识——即我们的智能体“知道”而非“记得经历过”的事情。这包括领域知识、关于世界的事实、公司政策、产品目录以及任何其他独立于特定交互而存在的信息。我们发现,强大的语义记忆能够让智能体提供准确的信息,而无需针对每个请求查询外部系统。

Semantic Memory encompasses factual knowledge—the things our agent “knows” rather than “remembers experiencing.” This includes domain knowledge, facts about the world, company policies, product catalogs, and any other information that exists independently of specific interactions. We’ve found that robust semantic memory allows agents to provide accurate information without needing to query external systems for every request.

程序记忆捕捉“操作方法”知识——即指导客服人员完成复杂任务的操作顺序、决策树和工作流程。在构建客服人员时,我们会将故障排除流程编码到程序记忆中,使客服人员能够引导用户逐步完成复杂流程。

Procedural Memory captures the “how-to” knowledge—sequences of actions, decision trees, and workflows that guide the agent through complex tasks. When building customer service agents, we encode troubleshooting procedures into procedural memory, allowing the agent to walk users through complex processes step by step.

当这些记忆类型协同工作时,奇迹就会发生。在我们与一家医疗机构的合作中,我们构建了一个智能体,它可以回忆患者的病史(情景记忆)、应用临床指南(语义记忆)并遵循正确的流程安排手术(程序记忆)。最终,我们打造了一个既能提供个性化护理指导,又能严格遵守医疗法规的系统。

The magic happens when these memory types work together. In our work with a healthcare provider, we built an agent that could recall a patient’s medical history (episodic), apply clinical guidelines (semantic), and follow proper protocols for scheduling procedures (procedural). The result was a system that could provide personalized care guidance while maintaining strict compliance with healthcare regulations.

图像

图 7.2:三种类型的长期记忆(来源:© Bornet 等人)

Figure 7.2: The Three Types of Long-Term Memory (Source: © Bornet et al.)

长期记忆的架构基础

Architectural Foundations for Long-Term Memory

构建高效的长期存储器需要周全的架构设计。通过我们的实现,我们完善了一种兼顾性能、可扩展性和实用性的存储器架构。

Building effective long-term memory requires a thoughtful architectural approach. Through our implementations, we’ve refined a memory architecture that balances performance, scalability, and practicality.

每种类型的存储器都需要针对其独特特性进行优化的特定存储解决方案:

Each type of memory requires specific storage solutions optimized for its unique characteristics:

对于情景记忆我们采用快速高效的存储方式来保存近期交互和上下文历史记录。Redis 或类似的内存数据库最适合存储短期情景记忆,无需复杂的检索机制即可快速访问近期对话。存储内容应包含时间戳、用户标识符、交互摘要以及已识别的实体/意图。对于一家零售客户,与之前的解决方案(每次查询都会检索所有历史交互记录)相比,这种方法将响应延迟降低了 40%。

For Episodic Memory: We implement fast, efficient storage for recent interactions and contextual history. Redis or similar in-memory databases work best for storing short-term episodic memory, enabling quick access to recent conversations without complex retrieval mechanisms. The storage should include timestamps, user identifiers, interaction summaries, and identified entities/intents. For one retail client, this approach reduced response latency by 40% compared to their previous solution, which retrieved all historical interactions for every query.

对于语义记忆我们使用诸如 Pinecone 或 Weaviate 之类的向量数据库来存储语义嵌入——概念、事实和知识的数值表示,从而实现基于相似性的检索。这些数据库擅长查找概念上相关的信息,即使没有完全匹配的关键词也能做到。这种记忆类型需要结构化的组织方式,包括清晰的分类、关系和元数据,以方便精确检索。

For Semantic Memory: We implement vector databases like Pinecone or Weaviate to store semantic embeddings—numerical representations of concepts, facts, and knowledge that enable similarity-based retrieval. These databases excel at finding conceptually related information, even when exact keyword matches aren’t present. This memory type requires a structured organization with clear categorization, relationships, and metadata to facilitate precise retrieval.

对于过程记忆我们以结构化格式存储工作流定义、决策树和流程图,通常使用传统的关系数据库或专用的工作流引擎。这些系统必须维护流程步骤、决策逻辑和条件分支的完整性。对于复杂的过程记忆,我们通常会实施版本控制来跟踪流程随时间的变化。

For Procedural Memory: We store workflow definitions, decision trees, and process maps in structured formats, typically using traditional relational databases or specialized workflow engines. These systems must maintain the integrity of process steps, decision logic, and conditional branching. For complex procedural memory, we often implement versioning to track changes to processes over time.

为了实现全面的智能体部署,我们将这些专用存储方式结合起来。在我们与一家法律科技公司的合作中,这种组合方法创建了一个智能体,该智能体能够从情景记忆中检索特定的案例引证(通过与关系数据库进行精确匹配),从语义记忆中访问法律原则(通过与向量数据库进行相似性匹配),并从程序记忆中遵循正确的法律分析流程(工作流系统)。这种组合方式在所有记忆类型中都兼顾了精确性和灵活性。

For comprehensive agent implementations, we combine these specialized stores. In our work with a legal tech company, this combined approach created an agent that could recall specific case citations from episodic memory (exact matching from a relational database), access legal principles from semantic memory (similarity matching from a vector database), and follow proper legal analysis procedures from procedural memory (workflow systems). This combination provided both precision and flexibility across all memory types.

实施路径:从零开始构建记忆

Implementation Pathway: Building Memory from the Ground Up

现在,让我们根据构建数十个内存增强型代理的经验,逐步介绍我们推荐的实现路径。

Now, let’s walk through our recommended implementation pathway based on our experience building dozens of memory-enhanced agents.

第一步:选择实施框架

Step 1: Select a Framework for Implementation

内存架构确定后,下一步是选择一个支持内存持久化、检索和更新机制的技术框架。基于我们在各个领域的实践经验,我们建议考虑以下经过验证的方案:

With the memory architecture defined, the next step is selecting a technological framework that supports memory persistence, retrieval, and updating mechanisms. Based on our implementations across various domains, we recommend considering these proven options:

LangChain为将情景记忆、语义记忆和程序记忆集成到基于 LLM 的智能体中提供了一个绝佳的起点。其记忆模块内置支持短期对话记忆(情景记忆)和长期知识存储(语义记忆和程序记忆)。我们发现 LangChain 对于快速原型开发和需要灵活集成的项目尤为宝贵。

LangChain provides an excellent starting point for integrating episodic, semantic, and procedural memory into LLM-powered agents. Its Memory module offers built-in support for short-term conversational memory (episodic) and long-term knowledge storage (semantic and procedural). We’ve found LangChain particularly valuable for rapid prototyping and projects where integration flexibility is important.

LlamaIndex(原名 GPT Index)能够动态地组织、存储和检索记忆。它尤其擅长创建和维护结构化的知识索引,这些索引可以高效地进行检索和存储。查询功能强大,使其成为具有复杂语义记忆需求的应用的理想选择。对于需要访问大型知识库的特定领域代理,LlamaIndex 在我们的实际应用中已被证明非常有效。

LlamaIndex (Formerly GPT Index) helps organize, store, and retrieve memory dynamically. It excels at creating and maintaining structured knowledge indexes that can be efficiently queried, making it ideal for applications with complex semantic memory requirements. For domain-specific agents that need to access large knowledge bases, LlamaIndex has proven highly effective in our implementations.

步骤 2:定义内存需求

Step 2: Define Memory Requirements

选定框架后,我们根据代理的用途,为每种内存类型定义具体的内存需求:

With the framework selected, we define specific memory requirements for each memory type based on the agent’s purpose:

对于情景记忆的需求,我们确定:

For Episodic Memory requirements, we determine:

哪些用户交互和对话元素需要在会话之间保留

Which user interactions and conversation elements need to persist across sessions

不同类型的情景信息应该保留多长时间?

How long different types of episodic information should be retained

应该跟踪哪些用户偏好和行为随时间的变化

Which user preferences and behaviors should be tracked over time

对于语义记忆需求,我们确定:

For Semantic Memory requirements, we determine:

智能体执行其任务需要哪些领域知识?

What domain knowledge is essential for the agent to perform its tasks

知识库应该包含哪些信息来源?

Which information sources should populate the knowledge base

必须维护哪些合规或监管信息

Which compliance or regulatory information must be maintained

对于程序内存需求,我们确定:

For Procedural Memory requirements, we determine:

代理人需要遵循哪些工作流程和程序

Which workflows and processes the agent needs to follow

流程选择和执行遵循何种决策逻辑?

What decision logic governs process selection and execution

流程必须严格遵守与允许灵活性之间的平衡。

How strictly processes must be followed versus allowing flexibility

例如,对于财务顾问,我们确定情景记忆应包含投资偏好、对话记录和以往的建议;语义记忆应包含投资产品详情和市场数据;程序性记忆应编码不同咨询场景下的合规工作流程。另一方面,交易详情将从安全系统中查询,而不是存储在顾问的记忆中。

For example, with a financial advisory agent, we determined that episodic memory should include investment preferences, conversation history, and previous advice given; semantic memory should contain investment product details and market data; and procedural memory should encode compliance workflows for different advisory scenarios. Transaction details, on the other hand, would be queried from secure systems rather than stored in the agent’s memory.

预先定义每种内存类型的具体要求,可以防止内存过载(存储过多,导致检索效率低下)和内存缺口(存储不足,迫使用户重复操作)。

Defining these specific requirements for each memory type upfront prevents both memory overload (storing too much, creating retrieval inefficiencies) and memory gaps (not storing enough, forcing users to repeat themselves).

步骤三:构建检索机制

Step 3: Build Retrieval Mechanisms

存储就绪后,我们专注于构建高效的检索机制。目标是在恰当的时间调出恰当的记忆,同时避免用无关信息淹没智能体。

With storage in place, we focus on building efficient retrieval mechanisms. The goal is to bring up the right memories at the right time without overwhelming the agent with irrelevant information.

我们发现,检索增强生成(RAG)方法对记忆增强型智能体非常有效。简而言之,RAG 将知识检索与语言生成相结合。可以将其理解为赋予人工智能智能体在生成答案之前“查找信息”的能力,类似于人类在回答复杂问题之前查阅记忆或参考资料的做法。

We’ve found that a retrieval-augmented generation (RAG) approach works well for memory-enhanced agents. In simple terms, RAG combines the power of knowledge retrieval with language generation. Think of it as giving the AI agent the ability to “look things up” in its memory before formulating a response, similar to how humans might consult their memories or reference materials before answering a complex question.

以下是 RAG 的实际工作原理:当代理收到用户查询时,它会:

Here’s how RAG works in practical terms: When the agent receives a user query, it:

1.分析查询和当前上下文以生成检索线索

1. Analyzes the query and current context to generate retrieval cues

2.利用这些线索从每个存储系统中检索相关记忆

2. Uses these cues to fetch relevant memories from each storage system

3.根据相关性、时效性和重要性对检索到的记忆进行排序和筛选。

3. Ranks and filters the retrieved memories based on relevance, recency, and importance

4.将最相关的记忆融入到推理过程中

4. Incorporates the most relevant memories into its reasoning process

5.生成一个利用当前上下文和已检索记忆的响应

5. Generates a response that leverages both the current context and retrieved memories

这种方法显著提高了智能体提供准确、符合上下文的响应的能力。RAG 并非试图将所有可能的知识编码到模型本身,而是允许智能体动态地访问每次交互所需的特定信息。

This approach significantly improves the agent’s ability to provide accurate, contextually relevant responses. Rather than trying to encode all possible knowledge within the model itself, RAG allows the agent to dynamically access the specific information needed for each interaction.

对于我们的法律研究助理来说,这种方法使代理人能够回忆起相关的案例法和法规,而不会在回复中充斥无关的引文。关键在于仔细调整相关性阈值——阈值太低,代理人会错过重要的先例;阈值太高,回复就会被无关信息淹没。

For our legal research assistant, this approach allowed the agent to recall relevant case law and statutes without overwhelming the response with irrelevant citations. The key was in carefully tuning the relevance thresholds—too low, and the agent would miss important precedents; too high, and responses would become cluttered with tangential information.

步骤 4:实施内存整合

Step 4: Implement Memory Consolidation

记忆架构中最关键的组成部分或许是记忆巩固机制——它决定了哪些信息会被长期记住,哪些信息会被遗忘。它决定了哪些信息会从情景记忆转移到语义记忆或程序性记忆。通过实验,我们开发了一种多方面的记忆巩固方法:

Perhaps the most critical component of our memory architecture is the consolidation mechanism—the process that determines what to remember long-term and what to forget. It defines which information will be transferred from the episodic to the semantic or procedural memory. Through experimentation, we’ve developed a multi-faceted approach to memory consolidation:

基于重要性的整合会评估新信息的重要性。当用户提供关键细节(例如偏好、要求或反馈)时,我们的系统会将这些信息标记出来并长期存储。对于我们开发的旅行预订代理系统而言,这意味着要永久记住用户偏好靠过道的座位或对贝类过敏等信息。

Importance-based consolidation evaluates the significance of new information. When a user provides critical details like preferences, requirements, or feedback, our system flags this information for long-term storage. For a travel booking agent we developed, this meant permanently remembering that a user prefers aisle seats or has a shellfish allergy.

基于频率的记忆巩固方法追踪重复出现的模式。在互动中频繁出现的信息很可能很重要,并会被提升到长期记忆中。我们的教育辅导代理利用这一点来识别和解决学生反复出现的知识缺口。

Frequency-based consolidation tracks recurring patterns. Information that appears frequently across interactions is likely important and gets promoted to long-term memory. Our educational tutoring agent uses this to identify and address recurring knowledge gaps in students.

显式信息整合是指用户或系统主动标记信息以便记住。“记住我更喜欢晚上预约”这句话会在我们的日程安排助手里触发显式信息整合。

Explicit consolidation occurs when users or the system deliberately marks information for remembrance. “Remember that I prefer evening appointments” triggers explicit consolidation in our scheduling assistant.

整合的另一面是遗忘——这对系统效率同样至关重要。我们针对特定类型的信息实施基于时间的衰减机制,设置相关性阈值来归档不常用的信息,并在信息过时或错误时启用显式遗忘机制。

The flip side of consolidation is forgetting—equally important for system efficiency. We implement time-based decay for certain types of information, relevance thresholds that archive rarely accessed memories, and explicit forgetting mechanisms when information becomes outdated or incorrect.

第五步:将记忆与智能体推理相结合

Step 5: Integrate Memory with Agent Reasoning

最后一步实现是将记忆检索和巩固与智能体的核心推理过程相结合。这种结合发生在三个关键点:

The final implementation step integrates memory retrieval and consolidation with the agent’s core reasoning process. This integration occurs at three key points:

情境准备会在智能体生成响应之前,将相关的记忆信息整合到情境窗口中。对于我们的财务顾问智能体而言,这意味着在每次提出投资建议时,都要将客户的风险承受能力和投资目标纳入考量。

Context preparation incorporates relevant memories into the context window before the agent generates a response. For our financial advisor agent, this meant including the client’s risk profile and investment goals in the context of every investment recommendation.

实时检索功能允许客服人员在推理过程中,如果需要更多信息,可以从中提取额外的记忆。例如,如果对话转向技术细节,我们的客服人员可以在对话过程中检索产品规格。

In-process retrieval allows the agent to pull additional memories during its reasoning process if it identifies a need for more information. Our customer service agent could retrieve product specifications mid-conversation if the dialogue turned to technical details.

回复后整合会在用户回复后评估交互情况,以识别需要长期存储的信息。用户是否表达了新的偏好?他们是否更正了信息?这些信息都会被捕获并整合。

Post-response consolidation evaluates the interaction after a response is generated to identify information for long-term storage. Did the user express a new preference? Did they correct the information? These insights are captured and consolidated.

将记忆与推理相结合,可以将人工智能从无状态的响应生成器转变为随着每次交互而不断进化的系统,随着时间的推移变得越来越个性化和有效。

Integrating memory with reasoning transforms an AI from a stateless response generator to a system that evolves with each interaction, becoming increasingly personalized and effective over time.

近未来:即将出现的关键演变

The Near Future: Key Evolutions on the Horizon

在不久的将来,随着计算能力、算法和集成技术的进步,我们可以预期记忆增强型人工智能将取得显著提升。其中一项关键进展是上下文优先级算法的改进,这将使人工智能系统能够更准确地实时判断哪些记忆最为相关。这将减少检索延迟,并增强系统处理动态、快速变化信息的能力。

In the near future, we can expect significant improvements in memory-enhanced AI driven by advancements in computational power, algorithms, and integration techniques. One key evolution will be the refinement of contextual prioritization algorithms, which will allow AI systems to better determine which memories are most relevant in real-time. This will reduce retrieval delays and enhance the system’s ability to handle dynamic, fast-changing information.

另一项近期可能取得突破的领域是实时内存整合,该系统能够即时分析和重组存储的数据。人工智能无需等待预定的更新,就能根据用户反馈和交互模式不断优化其内存库。这将使系统更具适应性,无需重新训练即可应对新的挑战。

Another near-term breakthrough is likely in the area of real-time memory consolidation, where systems will analyze and reorganize stored data on the fly. Instead of waiting for scheduled updates, AI will be able to continuously refine its memory banks based on user feedback and interaction patterns. This will make systems more adaptive, improving their ability to respond to new challenges without retraining.

近期研究130强调了多智能体系统在通过分配内存负载和增强协作来改进内存管理方面的潜力。这些多智能体系统允许不同的智能体专注于特定的方面。记忆。例如,一些智能体可以处理情景记忆以回忆过去发生的事件,而另一些智能体则可以管理实时交互或长期知识。这种分工可以防止记忆过载并提高效率。系统中的智能体可以共享情景记忆中的关键经验,从而实现协作学习。例如,一个智能体对成功策略的记忆可以为另一个智能体的决策提供信息,从而减少系统中重复探索的需求。

Recent research130 emphasizes the potential of multi-agent systems in improving memory management by distributing the memory load and enhancing collaboration. These multi-agent systems allow different agents to focus on specific aspects of memory. For example, some agents can handle episodic memory to recall past events, while others can manage real-time interactions or longer-term knowledge. This division of labor prevents memory overload and improves efficiency. Agents in the system can share key experiences from their episodic memory, enabling collaborative learning. For instance, one agent’s memory of a successful strategy can inform another agent’s decision-making, reducing the need for redundant exploration across the system.

另一项最新研究将这一切提升到了一个全新的高度。谷歌研究院发表的研究论文《泰坦:在测试时学习记忆》介绍了一种突破性的神经长期记忆模块,该模块使人工智能系统即使在部署后也能持续学习和适应。依赖向量数据库进行知识检索的传统人工智能模型(如前文所述)不同,泰坦将长期记忆集成到模型参数中,使其无需外部存储即可动态地记忆和回忆信息。

Another recent research brings all this to a whole new level. The research paper “Titans: Learning to Memorize at Test Time,” by Google Research, introduces a groundbreaking neural long-term memory module that enables AI systems to learn and adapt continuously even after deployment.131 Unlike traditional AI models that rely on vector databases for knowledge retrieval (as we have explained in the previous sections), Titans integrates long-term memory into the model’s parameters, allowing it to memorize and recall information dynamically without external storage.

另一个有趣的特性是它采用了基于意外的学习方法,优先处理意外或关键的输入,同时利用内置的遗忘机制丢弃过时的数据,从而确保效率并防止内存过载——这与人类记忆随着时间推移不断完善的过程非常相似。Titans 还引入了元学习,使 AI 能够即时记忆相关的经验而无需重新训练。这实现了实时适应,使 AI 在动态环境中更加灵活智能。

Another interesting feature is that it employs surprise-based learning, prioritizing unexpected or critical inputs while using a built-in forgetting mechanism to discard outdated data, ensuring efficiency and preventing memory overload—much like how human memory refines itself over time. Titans also introduces meta-learning, allowing AI to memorize relevant experiences on the fly without retraining. This enables real-time adaptation, making AI more responsive and intelligent in dynamic environments.

总而言之,Titan 能够实现更快、更高效的内存处理。同时,由于敏感数据保留在模型内部,降低了云存储带来的隐私风险,安全性也更高。这标志着人工智能代理处理内存方式的重大转变,我们迫不及待地想看看 Titan 将如何影响人工智能代理的未来发展。

In summary, Titans enable faster and better memory processing. It is also more secure because sensitive data stays within the model, reducing privacy risks associated with cloud-based storage. This marks a major shift in how AI agents handle memory, and we can’t wait to see how Titans impact the future of AI agents.

通过反馈循环进行适应和学习

Adaptation and Learning through Feedback Loops

永不停歇的学习者:来自未来的愿景

The Unstoppable Learner: A Vision from the Future

公元2042年,在新加坡郊外一个繁忙的全自动化物流中心,一台名为Nexus的全新人工智能代理迎来了它的第一天。与以往高度专业化的系统不同,Nexus并没有接​​受过任何预先训练。事实上,它对即将面临的任务一无所知。它既没有预加载的指令,也没有针对此环境量身定制的精细算法。Nexus唯一携带的指令是:“优化整个物流中心的配送效率。”

The year is 2042. In a bustling, fully automated logistics hub outside of Singapore, a newly deployed AI agent named Nexus arrives on its first day. Unlike the hyper-specialized systems of the past, Nexus isn’t pre-trained for its role. In fact, it knows nothing about the tasks it’s about to face. There are no preloaded instructions and no finely tuned algorithms tailored for this environment. Instead, Nexus carries only one instruction: “Optimize delivery efficiency across the hub.”

起初,这套系统步履维艰。它将无人机分配到错误的区域,未能准确计算包裹重量,甚至叉车在仓库里混乱穿梭,造成了一些小小的混乱。工人们带着怀疑的目光看着这一切,不明白管理层为何要在如此关键的环境中启用未经训练的人工智能。但Nexus并未气馁。凭借着强大的通用学习能力和高度精密的反馈回路,它开始观察、试验并不断调整。

At first, the system stumbles. It assigns drones to the wrong zones, fails to account for package weight, and even causes a few minor disruptions as forklifts move chaotically through the warehouse. Workers look on skeptically, wondering why management would unleash an untrained AI in such a critical setting. But Nexus isn’t deterred. Armed with the power of universal learning and a highly sophisticated feedback loop, it begins to observe, experiment, and adapt.

几个小时内,Nexus就开始识别出各种模式。运往同一目的地的包裹会被合并。无人机根据仓库拥堵情况调整飞行路线。它注意到叉车经常在等待任务时闲置,于是重新配置调度,最大限度地利用叉车。错误仍然会发生,但比以前少了很多。到了第二天,Nexus的效率就翻了一番。到周末,它已经掌握了物流中心的复杂运作机制,甚至超越了经验最丰富的规划人员。

Within hours, Nexus starts to pick up patterns. Packages bound for the same destination are consolidated. Drones adjust their flight paths based on warehouse congestion. It notices that forklifts often idle while waiting for tasks, so it reconfigures scheduling to maximize their use. Mistakes still happen, but fewer than before. By the second day, Nexus has doubled its efficiency. By the end of the week, it has mastered the complex choreography of the logistics hub, surpassing even the most experienced human planners.

Nexus系统投入使用两周后,物流中心发生了翻天覆地的变化。包裹的运输精准得近乎诡异,无人机的运行协调完美无瑕,叉车的停机时间也彻底消失了。曾经对Nexus系统持怀疑态度的员工现在都依赖它了。令人惊叹的是,它能够适应哪怕是最微小的干扰。管理层开始讨论将Nexus扩展到全球其他枢纽。

Two weeks after Nexus’s arrival, the logistics hub has transformed. Packages move with almost eerie precision, drones operate with perfect coordination, and downtime for forklifts has vanished. Workers who once doubted Nexus now rely on it, marveling at its ability to adapt to even the smallest disruptions. Management begins discussing expanding Nexus to other hubs worldwide.

Nexus 的核心理念是革命性的:通用学习器,一种无需事先训练即可胜任任何任务的人工智能代理,它能通过反复试验、不断纠错和反馈来精通各项技能。这种系统的优势远不止于物流方面。试想一下,如果让一个人工智能代理来设计一个全新的营销活动,它会如何运作?它对你的品牌和目标受众一无所知,却能立即开始试验——测试标语、分析互动数据并不断优化策略。短短几天内,它就能生成一个精准高效、足以媲美经验丰富的创意团队的营销方案。

What Nexus embodies is a revolutionary concept: a universal learner, an AI agent that can take on any task, even without prior training, and achieve mastery through trial, error, and feedback. The benefits of such a system extend far beyond logistics. Imagine an AI agent tasked with designing a new marketing campaign. It knows nothing about your brand or audience but begins experimenting—testing slogans, analyzing engagement data, and refining its approach. Within days, it generates a campaign so tailored and effective that it rivals the work of seasoned creative teams.

反馈循环是这种神奇力量背后的引擎。未来的智能体(例如 Nexus)不再依赖于需要用海量数据集进行预训练的静态模型,而是依靠动态学习。它们会采取行动、观察结果并进行调整,每次迭代都会变得更加智能。这种方法不仅使人工智能具有适应性,更使其势不可挡

Feedback loops are the engine behind this magic. Instead of relying on static models that require pre-training on vast datasets, future agents like Nexus rely on dynamic learning. They act, observe outcomes, and adjust, growing smarter with every iteration. This approach doesn’t just make AI adaptable—it makes it unstoppable.

这对商业和社会的影响是巨大的。随着通用学习能力的出现,人工智能部署的障碍将不复存在。企业不再需要投入数百万美元来训练系统以执行狭窄的专业任务。相反,人工智能可以被部署到任何环境中,设定目标,然后自主学习。这降低了部署成本,缩短了生产力提升所需的时间,并消除了每次流程或目标发生变化时都需要重新训练系统的瓶颈。

The implications for business and society are staggering. With universal learning, the barriers to deploying AI vanish. Companies no longer need to invest millions in training systems for narrow, specialized tasks. Instead, AI can be dropped into any environment, given a goal, and left to learn. This reduces deployment costs, shortens the time to productivity, and eliminates the bottleneck of retraining systems every time processes or goals change.

更重要的是,像Nexus这样的通用学习系统本身就具有很强的适应能力。它们能够在动态环境中茁壮成长,不断适应新的挑战。例如,如果物流中心的规则在一夜之间发生变化,比如引入可生物降解的包装或对无人机的重量有限制,Nexus也不需要更新。它的反馈回路会自动吸收新的参数,并相应地调整自身的行为。

More importantly, universal learners like Nexus are inherently resilient. They thrive in dynamic environments, continuously adapting to new challenges. If the rules of the logistics hub were to change overnight, for example, by introducing biodegradable packaging or drones with weight restrictions, Nexus wouldn’t need an update. Its feedback loop would simply absorb the new parameters and adjust its behavior accordingly.

当然,这种未来愿景也引发了一些问题。当通用学习者遇到模糊不清的目标时,会发生什么?我们如何确保他们在适应过程中遵循伦理规范?答案在于我们如何设计他们的反馈机制。例如,Nexus 系统配备了多目标优化机制,在效率、安全性和环境可持续性之间取得平衡。伦理框架已融入其学习过程,确保任何试错都不会损害核心价值观。

Of course, this vision of the future raises questions. What happens when universal learners encounter ambiguous goals? How do we ensure they act ethically as they adapt? The answer lies in how we design their feedback loops. Nexus, for instance, was equipped with multi-objective optimization, balancing efficiency with safety and environmental sustainability. Ethical frameworks are baked into its learning process, ensuring that no amount of trial and error compromises core values.

另一个需要考虑的因素是透明度。企业领导者必须确保这些决策者解释他们的决策,从而建立利益相关者之间的信任。如果反馈机制的设计以问责制为核心,那么就可以包含可解释性机制——让决策者了解决策背后的原因以及该决策如何与目标保持一致。

Another consideration is transparency. Business leaders must ensure that these agents explain their decisions, fostering trust among stakeholders. Feedback loops, when designed with accountability in mind, can include explainability mechanisms—offering insights into why a decision was made and how it aligns with goals.

反馈回路是自适应系统的基石,它使人工智能代理能够不断改进决策、从错误中学习,并随着时间的推移更加紧密地与目标保持一致。对于商业领袖而言,这个概念看似简单却意义非凡:允许人工智能系统分析自身表现、从结果中学习,并利用这些学习成果来改进未来的行动。这正是代理在第一天表现尚可和在第一千天后表现卓越之间的区别。

Feedback loops are the backbone of adaptive systems, enabling AI agents to refine their decision-making, learn from mistakes, and align more closely with goals over time. For business leaders, the concept is deceptively simple yet transformative: allow AI systems to analyze their performance, learn from outcomes, and use that learning to improve future actions. It’s the difference between an agent that performs adequately on day one and one that excels on day one thousand.

反馈回路的魔力

The Magic of Feedback Loops

人工智能中的反馈回路反映了地球生命的进化过程。在进化过程中,自然选择扮演着反馈机制的角色,那些能够增强生存和繁殖的性状会在世代间得到强化。类似地,在人工智能中,诸如强化学习之类的反馈回路使系统能够通过试错进行学习。强化学习的工作原理是奖励那些能够达成既定目标或提升性能的理想行为,同时惩罚那些失败的行为,正如在进化过程中,有益的性状会通过遗传给后代而得到“奖励”一样。经过几代人的迭代,随着时间的推移,这个过程会塑造人工智能的行为,从而最大限度地提高成功率。

Feedback loops in AI mirror the evolutionary processes of life on Earth. In evolution, natural selection acts as a feedback mechanism, where traits that enhance survival and reproduction are reinforced over generations. Similarly, in AI, feedback loops like reinforcement learning enable systems to learn through trial and error. Reinforcement learning132 works by rewarding desirable actions—those that achieve a set goal or improve performance—while penalizing failures, just as beneficial traits are “rewarded” in evolution by being passed on to future generations. Over time, this process shapes the AI’s behavior to maximize successful outcomes.

正如物种通过不断迭代改进来适应环境一样,人工智能模型也通过学习循环不断演进,从错误中学习并改进其输出。两者都依赖于持续的输入、适应和迭代来实现复杂性和效率,随着时间的推移,将随机变化转化为优化的、有目的的系统。

Just as species adapt to their environments through iterative refinements, AI models evolve through learning cycles, learning from errors and refining their outputs. Both rely on continuous input, adaptation, and iteration to achieve complexity and efficiency, transforming random variations into optimized, purposeful systems over time.

反馈回路的核心在于行动、观察和调整的循环。人工智能体采取行动,接收关于其结果的反馈,并利用这些信息来改进自身的行为。这种循环可以实时运行,例如自动驾驶汽车学习如何在繁忙的十字路口行驶;也可以持续运行更长时间,例如推荐系统根据客户偏好不断演进。

At its core, a feedback loop is a cycle of action, observation, and adjustment. An AI agent takes an action, receives feedback on its outcome, and uses that information to refine its behavior. This loop can operate in real-time, as in autonomous vehicles learning to navigate a busy intersection or over longer periods, as seen in recommendation systems that evolve with customer preferences.

为什么这很重要?传统的AI系统通常在静态环境中运行,基于固定的数据集进行训练,这些数据集只能捕捉现实的某个瞬间。但现实世界远非静止不变。市场环境瞬息万变,客户偏好不断演变,竞争对手也在不断创新。如果没有适应机制,AI系统很快就会过时。反馈回路通过将智能体转变为一个“活系统”来解决这个问题——这个系统会在与世界的互动中不断成长,变得更加智能。

Why does this matter? Traditional AI systems often operate in static environments, trained on fixed datasets that capture a snapshot of reality. But the real world is anything but static. Market conditions shift, customer preferences evolve, and competitors innovate. Without a mechanism to adapt, AI systems quickly become obsolete. Feedback loops solve this problem by turning the agent into a living system—one that grows smarter as it interacts with the world.

反馈回路是如何运作的?

How Do Feedback Loops Work?

一切都始于反馈,而反馈可以通过不同的方式收集和应用:

It all starts with feedback, which can be collected and applied in different ways:

明确的用户反馈:代理会向用户征求评分、确认或更正(例如,“这个答案有帮助吗?”)。

Explicit User Feedback: Agents ask users for ratings, confirmations, or corrections (e.g., “Did this answer help?”).

隐式反馈:系统跟踪用户行为,例如任务完成率、放弃的工作流程或回复更正。

Implicit Feedback: The system tracks user behavior, such as task completion rates, abandoned workflows, or response corrections.

系统级强化信号:奖励模型强化良好行为(例如,成功执行任务),并阻止失败行为(例如,错误建议)。

System-Level Reinforcement Signals: Reward models reinforce good actions (e.g., a successful task execution) and discourage failures (e.g., incorrect recommendations).

例如,客户支持人工智能代理可以询问“这个问题解决了吗?”如果用户选择“否”,代理会将此记录为失败,并在以后的互动中调整其方法。

For example, a customer support AI agent can ask, “Did this resolve your issue?” If the user selects “No,” the agent logs this as a failure and adjusts its approach in future interactions.

为了实现反馈回路,这些智能体按照执行、评估和调整的循环进行工作:

To operate their feedback loop, the agents work in a cycle of execution, evaluation, and adjustment:

观察:代理从交互中收集数据(用户反馈、任务成功率、隐含线索)。

Observation: The agent gathers data from interactions (user feedback, task success rates, implicit cues).

评价:系统根据反馈给予奖励分数(正分或负分)。

Evaluation: The system assigns a reward score (positive or negative) based on feedback.

调整:人工智能代理通过更新其模型、提示或工作流程来改进未来的响应。

Adjustment: The AI agent refines future responses by updating its model, prompts, or workflows.

部署:改进后的代理程序已部署,并会随着时间的推移不断学习。

Deployment: The improved agent is deployed and continues learning over time.

例如,个人理财人工智能助手最初会提供通用的预算建议。如果用户拒绝这些建议或要求修改,它会根据用户偏好模式调整未来的建议。

For example, a personal finance AI assistant initially gives generic budget advice. If users reject its suggestions or request modifications, it adapts its future recommendations based on patterns of user preferences.

制造反馈回路成功案例

A Manufacturing Feedback Loop Success Story

2022 年末,欧洲一家大型汽车零部件制造商面临一项严峻挑战:其质量控制流程无法跟上生产速度。传统的自动化系统无法识别出人工检验员能够瞬间发现的细微缺陷。他们最终实施的解决方案不仅彻底改变了其质量控制体系,也改变了我们对人工智能学习方式的理解。

In late 2022, one of Europe’s largest automotive parts manufacturers faced a critical challenge: their quality control process couldn’t keep pace with production. Traditional automation was failing to catch subtle defects that human inspectors could spot instantly. The solution they implemented would transform not just their quality control but our understanding of how AI agents learn.

“我们尝试了一切办法,”他们的运营主管回忆道。“标准的计算机视觉、基于规则的系统,甚至基本的AI模型。但缺陷仍然层出不穷。我们需要的是……”我们需要的不仅仅是一个更智能的系统——一个能够像我们最优秀的检查员一样学习和适应的系统。

“We had tried everything,” recalls their Head of Operations. “Standard computer vision, rule-based systems, even basic AI models. But defects kept slipping through. What we needed wasn’t just a smarter system—we needed one that could learn and adapt like our best human inspectors.”

这项突破源于人工智能代理的实现,该代理配备了复杂的反馈回路。系统不再仅仅标记缺陷,而是追踪其决策的结果,从成功和失败中学习。当它正确识别出细微缺陷时,该模式会在记忆中得到强化。当它漏检时,则会调整检测参数。

The breakthrough came with the implementation of an AI agent equipped with sophisticated feedback loops. Instead of just flagging defects, the system tracked the outcomes of its decisions, learning from both successes and failures. When it correctly identified a subtle defect, that pattern strengthened in its memory. When it missed one, it adjusted its detection parameters.

但真正的奇迹发生在部署三个月后。这套系统开始识别出一些连经验丰富的检验员都没注意到的潜在缺陷——一些先于更明显缺陷出现的细微模式。“这就像拥有了一位不仅能发现问题还能预测问题的检验员,”他们的质量控制总监解释说。“人工智能不仅能从反馈中学习,还能发现我们甚至从未考虑过的新见解。”

But the real magic happened three months into the deployment. The system began identifying potential defects that even experienced inspectors hadn’t noticed—subtle patterns that preceded more obvious flaws. “It was like having an inspector who could not only spot problems but predict them,” explains their Quality Control Director. “The AI wasn’t just learning from feedback—it was discovering new insights we hadn’t even considered.”

结果令人瞩目:质量控制准确率提高了 32%,检验时间减少了 45%。更重要的是,该系统每月都在持续改进,识别出新的模式,并不断加深对制造过程的理解。

The results were transformative: quality control accuracy improved by 32%, while inspection time decreased by 45%. More importantly, the system continued to improve month after month, identifying new patterns and refining its understanding of the manufacturing process.

战略优势

The Strategic Advantage

对于企业领导者而言,反馈循环带来的不仅仅是渐进式的改进,它还能在人工智能竞赛中提供战略优势。采用反馈循环的企业能够构建适应速度更快、学习能力更强、且能保持更长时间竞争力的系统,从而超越竞争对手。在这个瞬息万变的世界里,进化能力不仅是一种优势,更是一种必需品。通过将反馈循环集成到人工智能代理中,企业可以释放更高的效率、更个性化的体验和更强的创新能力。

For business leaders, feedback loops offer more than just incremental improvement—they provide a strategic advantage in the AI arms race. Companies that embrace feedback loops can outpace competitors by building systems that adapt faster, learn better, and stay relevant longer. In a world where the pace of change is accelerating, the ability to evolve is not just an asset; it’s a necessity. By integrating feedback loops into their AI agents, businesses can unlock new levels of efficiency, personalization, and innovation.

反馈循环能够解决人工智能实施中最紧迫的一些挑战。以客户体验为例。一个旨在处理客户咨询的人工智能代理在上线当天可能表现良好,但当用户开始提出新的问题类型或以新的方式表达不满时会发生什么呢?如果没有反馈循环,代理就会停滞不前,提供无关的回复,导致客户不满。而有了反馈循环,聊天机器人可以从每一次互动中学习,调整其回复以更好地满足客户需求。

Feedback loops address some of the most pressing challenges in AI implementation. Consider customer experience. An AI agent designed to handle customer inquiries might perform well on launch day, but what happens when users begin asking new types of questions or expressing frustration in novel ways? Without feedback loops, the agent stagnates, offering irrelevant responses and driving customer dissatisfaction. With feedback loops, the chatbot learns from every interaction, adjusting its responses to better meet customer needs.

反馈循环在电子商务等动态行业中也发挥着重要作用,因为人工智能代理必须对不断变化的需求模式、库存水平和竞争对手的定价做出反应。反馈循环使系统能够保持相关性,确保推荐和决策始终与当前实际情况相符。

Feedback loops also shine in dynamic industries like e-commerce, where AI agents must react to changing demand patterns, inventory levels, and competitor pricing. They enable systems to stay relevant, ensuring that recommendations and decisions are always in tune with current realities.

让我们用两个我们实际处理过的例子来说明反馈回路的影响。第一个例子非常有趣,因为它展示了人工智能代理如何通过预测和预防制造过程中代价高昂的故障来优化工业效率。这凸显了反馈回路不仅能提高技术精度,还能直接带来可衡量的业务成果,例如提高盈利能力和减少停机时间。

Let’s illustrate the impact of feedback loops with two real-world examples we worked on. The first one is very interesting because it shows how AI agents can optimize industrial efficiency by predicting and preventing costly failures in manufacturing. This highlights how feedback loops not only improve technical accuracy but also directly contribute to measurable business outcomes, such as increased profitability and reduced downtime.

示例 1:用于制造业预测性维护的 AI 代理

Example 1: AI Agent for Predictive Maintenance in Manufacturing

1. 初始任务:人工智能代理监控工厂机械设备,预测潜在故障并最大限度地减少停机时间。例如,它会分析来自传感器的数据,这些传感器跟踪振动、温度和磨损情况。

1. Initial Task: The AI agent monitors factory machinery to predict potential breakdowns and minimize downtime. For example, it analyzes data from sensors tracking vibration, temperature, and wear.

2. 行动生成:基于获取到的信息,人工智能生成可执行的建议:

2. Action Generation: Based on the retrieved insights, the AI generates actionable recommendations:

“振动模式表明X机器的轴承存在磨损。请安排在72小时内更换轴承,以防止故障发生。”

“The vibration pattern suggests bearing wear in Machine X. Schedule a bearing replacement within the next 72 hours to prevent failure.”

3. 通过收入指标实现自动化反馈

3. Automated Feedback Through Revenue Metrics:

该系统使用预定义的指标来跟踪其行动的财务结果,例如减少停机时间、降低维修成本或提高产量。

The system tracks the financial outcomes of its actions using predefined indicators, such as reduced downtime, lower repair costs, or increased output.

如果维护干预措施防止了故障发生,则将其记录为积极结果,并将其与具体建议和检索到的数据关联起来。

If the maintenance intervention prevents a breakdown, it records this as a positive outcome and links it to the specific recommendation and retrieved data.

4. 正强化学习

4. Positive Reinforcement Learning:

人工智能在其预测模型中强化了振动模式与轴承磨损之间的关联。

The AI reinforces the association between vibration patterns and bearing wear in its predictive model.

它将检索到的数据标记为与类似问题高度相关,从而提高未来异常情况的检索准确性。

It flags the retrieved data as highly relevant for similar issues, improving its retrieval accuracy for future anomalies.

5. 更新内存

5. Updating the memory:

本次事件的维护日志和结果将添加到数据库中,从而为系统将来提供新的知识。

Maintenance logs and outcomes from this event are added to the database, creating new knowledge the system can draw from in the future.

该系统还融入了成本效益分析,将具体行动与节省或产生的收入联系起来。

The system also incorporates cost-benefit analysis, associating specific actions with the revenue saved or generated.

6. 自适应行为:随着时间的推移,人工智能能够更早地识别出细微的故障迹象,并优化其建议以减少代价高昂的停机时间。它还可以学习根据财务影响来确定行动的优先级,确保首先处理最关键的干预措施。

6. Adaptive Behavior: Over time, the AI becomes better at identifying subtle signs of failure earlier, optimizing its recommendations to reduce costly downtime. It may also learn to prioritize actions based on financial impact, ensuring the most critical interventions are addressed first.

因此,人工智能代理通过最大限度地减少生产损失、降低昂贵的紧急维修成本以及提高制造过程的整体效率,从而最大限度地提高效率。每一次成功的预测和干预都会完善其模型和检索数据库,从而提升其长期盈利能力。

As an outcome, the AI agent maximizes efficiency by minimizing production losses, reducing expensive emergency repairs, and increasing the overall efficiency of the manufacturing process. Each successful prediction and intervention refines its models and retrieval database, enhancing its long-term profitability.

示例 2:用于个性化产品推荐的 AI 代理

Example 2: AI Agent for Personalized Product Recommendations

第二个例子展示了反馈循环在电子商务中的变革潜力,其中个性化能够提升客户满意度和收入增长。它展示了人工智能代理如何从用户行为中学习并改进其推荐,从而形成一个持续改进的循环,完美契合动态的市场需求和个人偏好。

The second example demonstrates the transformative potential of feedback loops in e-commerce, where personalization drives customer satisfaction and revenue growth. It showcases how an AI agent learns from user behavior to refine its recommendations, creating a cycle of continuous improvement that aligns perfectly with dynamic market demands and individual preferences.

1. 初始交互:用户访问电商网站并搜索“适合越野跑的舒适跑鞋”。人工智能代理处理此输入并生成一个嵌入,该嵌入捕捉用户对舒适性和越野专用鞋的偏好。

1. Initial Interaction: A user visits an e-commerce site and searches for “comfortable running shoes for trail running.” The AI agent processes this input and generates an embedding that captures the user’s preference for comfort and trail-specific footwear.

2. 回复生成:人工智能会生成个性化推荐:

2. Response Generation: The AI generates a tailored recommendation:

“根据您的搜索记录,我们推荐 TrailMax Comfort Runner,这款跑鞋专为崎岖地形设计,具有额外的缓震性能。这款跑鞋有您的尺码,售价 120 美元。您想查看评价还是将其添加到购物车?”

“Based on your search, we recommend the TrailMax Comfort Runner, designed for rugged terrains with extra cushioning. It’s available in your size for $120. Would you like to see reviews or add it to your cart?”

3. 自动反馈收集

3. Automated Feedback Collection:

代理会跟踪用户是否点击推荐、将产品添加到购物车或完成购买。

The agent tracks whether the user clicks the recommendation, adds the product to the cart, or completes the purchase.

积极的信号(例如购买或点击)会强化推荐的成功。

Positive signals (e.g., a purchase or click-through) reinforce the recommendation’s success.

负面信号(例如,用户忽略建议或继续搜索)表明需要改进。

Negative signals (e.g., the user ignores the suggestion or continues searching) indicate the need for improvement.

4. 正强化学习

4. Positive Reinforcement Learning:

如果用户购买了 TrailMax Comfort Runner,系统会将此视为积极的结果,并向有类似查询的未来用户推荐类似的产品。

If the user buys the TrailMax Comfort Runner, the system treats this as a positive outcome, reinforcing similar recommendations for future users with similar queries.

人工智能会更新其嵌入代码,将舒适性越野跑与销量和客户满意度高的产品更紧密地联系起来。

The AI updates its embeddings, associating comfort and trail running more strongly with products that perform well in sales and customer satisfaction.

5. 更新内存

5. Updating the memory:

对于特定搜索查询,转化率持续较高的产品会被标记为表现最佳,并在未来的推荐中优先考虑。

Products with consistently high conversion rates for specific queries are flagged as top-performing and prioritized in future recommendations.

该人工智能还会将用户生成的反馈(例如评论和评分)整合到其数据库中,以提高其对客户偏好的理解。

The AI also integrates user-generated feedback, such as reviews and ratings, into its database to improve its understanding of customer preferences.

6. 自适应行为:随着时间的推移,人工智能会学习优先推荐符合用户偏好且更有可能带来收益的产品。例如,如果某些价格稍高但评价更好的产品拥有良好的销售记录,它可能会开始推荐这些产品。

6. Adaptive Behavior: Over time, the AI learns to prioritize products that align with user preferences and have a higher likelihood of generating revenue. For instance, it may start suggesting slightly higher-priced items with better reviews if they have a strong track record of driving purchases.

当其他用户搜索“越野跑鞋”时,人工智能会立即推荐符合其相似偏好的高绩效、高收益产品,从而提高转化率并增加收入。这种反馈循环确保了智能代理的行为能够持续适应客户偏好和业务目标。

When another user searches for “trail running shoes,” the AI immediately suggests top-performing, high-revenue products tailored to similar preferences, improving the chances of conversion and increasing revenue. This feedback loop ensures that the agent’s behavior continuously adapts to customer preferences and business goals.

今日绩效:反馈回路的作用

Performance Today: What Feedback Loops Deliver

反馈循环已在各行各业带来变革性的性能提升。根据我们的经验,如今配备精心设计的反馈循环的人工智能代理能够展现出卓越的适应性和精准度。例如,通过分析和学习用户交互,它们可以在部署后的几周内将响应准确率提高 20% 以上。在电子商务领域,由反馈循环驱动的推荐引擎通常能够实时适应不断变化的客户偏好,从而将点击率提升 10% 至 30%。

Feedback loops are already delivering transformative performance improvements in various industries. Today, based on our experience, AI agents equipped with well-designed feedback loops can achieve remarkable feats of adaptability and precision. For example, they can improve their response accuracy by over 20% within weeks of deployment by analyzing and learning from user interactions. In e-commerce, recommendation engines powered by feedback loops often see click-through rates increase by 10-30% as they adapt to evolving customer preferences in real-time.

然而,并非所有应用场景都能达到相同的性能水平。反馈回路的有效性取决于环境的复杂性和数据的质量。在相对稳定的环境中,例如欺诈检测或库存管理,反馈回路可以在几个月内使性能接近最优。而在更动态或更难以预测的环境中,例如金融市场或人类行为建模,改进速度可能会较慢,但取得突破的潜力巨大。

However, performance isn’t uniform across all applications. The effectiveness of feedback loops depends on the complexity of the environment and the quality of data. In relatively stable settings, like fraud detection or inventory management, feedback loops can lead to near-optimal performance within months. In more dynamic or unpredictable environments, such as financial markets or human behavior modeling, the improvement curve may be slower, but the potential for breakthroughs is immense.

局限性和伦理考量

Limitations and Ethical Considerations

为了有效利用反馈循环,企业领导者必须解决几个关键因素,以确保其人工智能系统以最佳和合乎道德的方式运行。

To effectively leverage feedback loops, business leaders must address several critical factors to ensure their AI systems function optimally and ethically.

及时性是关键考量因素。延迟或过时的反馈会严重阻碍人工智能系统适应动态环境的能力。为了保持有效性,系统必须尽可能实时地处理数据并采取行动,尤其是在金融或物流等快节奏行业。缺乏及时的反馈,整个流程可能会出现问题,导致结果不尽如人意。

A key consideration is timeliness. Feedback that is delayed or outdated significantly hinders an AI system’s ability to adapt to dynamic environments. Systems must process and act on data as close to real-time as possible to remain relevant, especially in fast-paced industries like finance or logistics. Without timely feedback, the entire loop can falter, leading to suboptimal outcomes.

强大的基础设施是另一个关键考量因素。反馈回路需要能够高效处理海量数据的系统。诸如 AWS 或 Azure 之类的云平台提供了可扩展的解决方案,用于以复杂人工智能系统所需的规模收集、处理和分析数据。如果没有这种基础设施,组织将面临瓶颈,从而限制其人工智能的适应性和性能。

Robust infrastructure is another key consideration. Feedback loops require systems capable of handling large volumes of data efficiently. Cloud platforms such as AWS or Azure provide scalable solutions for collecting, processing, and analyzing data at the scale necessary for complex AI systems. Without this infrastructure, organizations face bottlenecks that can limit their AI’s adaptability and performance.

人为监督对于避免意外后果至关重要。虽然反馈回路使人工智能系统能够高度自主地运行,但必须对其决策进行监控,以确保其符合组织目标和伦理标准。定期审核系统输出至关重要,这既能确保准确性,又能及时发现并解决潜在的偏差或意外行为。

Human oversight is critical to avoid unintended consequences. While feedback loops enable AI systems to operate with significant autonomy, their decisions must be monitored to ensure alignment with organizational goals and ethical standards. Regular audits of system outputs are essential, both to ensure accuracy and to address potential deviations or unintended behaviors.

用户操控是一个值得关注的伦理问题。人工智能系统(尤其是推荐引擎)为了追求最佳结果,可能会利用用户的心理弱点或诱导他们沉迷于某些内容。虽然这种做法可能带来短期收益,但却会损害用户信任,破坏长期关系。企业必须寻求平衡,设计出既能兼顾用户福祉又能实现业务目标的反馈机制。

An ethical concern is user manipulation. In their drive to optimize outcomes, AI systems—particularly recommendation engines—can exploit psychological triggers or push users toward addictive content. While such practices may yield short-term gains, they risk damaging trust and long-term relationships with users. Organizations must strike a balance, designing feedback loops that prioritize user well-being alongside business objectives.

过拟合是人工智能系统常见的局限性,它会导致系统过于专注于优化某个特定指标,而忽略了整体情况。例如,客服人工智能可能过于注重缩短响应时间,以至于牺牲了答案的质量。这是因为人工智能学习“记忆”那些对单一目标有效的模式,而不是适应各种不同的场景。为了避免这种情况,多目标优化方法可以平衡速度、准确性和用户满意度等多个优先级,确保人工智能在多个领域都能表现良好,而不会过于专注于某个特定目标。

Overfitting is a common limitation where AI systems become too focused on optimizing a specific metric, losing sight of the bigger picture. For instance, a customer service AI might prioritize reducing response times so much that it sacrifices the quality of its answers. This happens because the AI learns to “memorize” patterns that work well for one goal rather than adapting to a variety of scenarios. To prevent this, multi-objective optimization is used to balance priorities like speed, accuracy, and user satisfaction, ensuring the AI performs well across multiple areas without becoming too narrowly focused.

在实践中,成功实施反馈回路需要精心设计、严密监控和符合伦理的前瞻性。如果运用得当,反馈回路可以驱动人工智能系统不断改进和适应,从而为企业创造巨大价值。

In practice, successful feedback loop implementation requires careful design, vigilant monitoring, and ethical foresight. When done right, feedback loops can drive AI systems to continuously improve and adapt, providing immense value to businesses.

未来发展趋势:近期和长期展望

The Evolution Ahead: Near-Term and Long-Term Outlook

短期内,我们可以预期反馈循环将变得更加顺畅和自动化。实时数据处理和边缘计算技术的进步已经推动了更快、更高效的反馈循环。例如,随着人工智能系统整合用户的即时反馈并动态调整策略,营销中的实时个性化将变得更加精准。在物流领域,人工智能驱动的反馈循环将通过实时响应交通状况、天气变化和需求波动,彻底改变运营模式,从而缩短交付时间并降低成本。

In the near term, we can expect feedback loops to become more seamless and automated. Advances in real-time data processing and edge computing are already enabling faster and more efficient feedback cycles. For instance, real-time personalization in marketing will become more refined as AI systems integrate instantaneous user responses to adjust their strategies dynamically. In logistics, AI-powered feedback loops will revolutionize operations by responding to traffic conditions, weather changes, and demand fluctuations on the fly, reducing delivery times and costs.

未来几年,强化学习与先进反馈机制的融合将进一步拓展人工智能的边界。智能体不仅能从单个动作中学习,还能发展出跨任务泛化能力。例如,经过训练能够堆放箱子的仓库机器人,只需极少的重新训练,就能将其学习成果迁移到其他任务,例如组装产品。这种跨任务适应性将显著扩展反馈回路的应用范围,使人工智能系统更加灵活和稳健。

Over the next few years, the integration of reinforcement learning with advanced feedback mechanisms will push boundaries further. Agents will not only learn from individual actions but also develop the ability to generalize learning across tasks. For example, a warehouse robot trained to stack boxes could transfer its learning to other tasks, like assembling products, with minimal retraining. This cross-task adaptability will significantly expand the applicability of feedback loops, making AI systems more versatile and resilient.

从长远来看,人工智能代理很可能会同时整合多个反馈回路,每个回路都针对不同的性能方面,例如速度、质量和用户满意度。试想一下,一个人工智能代理不仅能学习你的日程安排,还能根据你互动中提供的细微反馈来调整其语气、风格,甚至主动程度。

In the longer term, AI agents will likely integrate multiple feedback loops simultaneously, each addressing a different aspect of performance, such as speed, quality, and user satisfaction. Imagine an AI agent that not only learns your schedule but also adapts its tone, style, and even its level of proactivity based on nuanced feedback from your interactions.

Hospedales 等人最近的研究表明,我们正在接近他们所谓的“元学习系统”——人工智能体不仅能从经验中学习,还能学习如何更有效地学习。<sup>133</sup>他们早期的实验表明,这些系统适应新情况的速度比传统学习方法快3倍。

Recent work by Hospedales et al. suggests we’re approaching what they call “meta-learning systems”—AI agents that don’t just learn from experience but learn how to learn more effectively.133 Their early experiments show these systems adapting to new situations up to 3 times faster than traditional learning approaches.

自优化结构自主演化代表了人工智能结构适应性的两个互补方面。自优化结构侧重于针对特定任务的调整,其中反馈回路使人工智能系统能够实时动态地重新配置其架构,从而提高效率和性能。例如,最近的研究表明,人工智能如何在训练过程中逐层优化自身,确保其在面对即时需求时保持相关性和资源效率。<sup> 134</sup>

Self-optimizing structures and autonomous evolution represent two complementary aspects of structural adaptability in AI. Self-optimizing structures focus on task-specific adjustments, where feedback loops enable AI systems to dynamically reconfigure their architecture in real-time for efficiency and performance. For instance, recent research showcases how AI can optimize itself layer by layer during training, ensuring it remains relevant and resource-efficient in the face of immediate demands.134

然而,自主演化将这种适应能力提升到了更高的层次。正如近期研究表明,135个人工智能体不仅能够利用反馈来优化当前状态,还能演化出全新的框架,从而为未来的任务或环境做好准备。这一过程与生物演化类似,生物体的生存依赖于迭代式的成长和适应,这使得人工智能系统能够应对超出其初始设计范围的挑战。

Autonomous evolution, however, takes this adaptability further. As demonstrated by recent studies,135 AI agents can use feedback not only to optimize for the present but to evolve entirely new frameworks that prepare them for future tasks or environments. This process mirrors biological evolution, where survival depends on iterative growth and adaptation, making AI systems capable of tackling challenges beyond their initial design.

***

***

这些进展共同展现了反馈回路在塑造人工智能系统方面的变革潜力,这些系统不仅知识更丰富,而且结构上也更具适应性。这种演进将反馈定位为人工智能学习、进化并与实际应用和以人为本的需求相契合的驱动力。这些目标展现了机器与人类共同成长的未来图景。

Together, these advancements show the transformative potential of feedback loops in shaping AI systems that are not only more knowledgeable but also structurally adaptable. This evolution positions feedback as the driving force behind AI that learns, evolves, and aligns with both practical and human-centered goals, offering a glimpse of a future where machines grow alongside humanity.

了解人工智能中的内存工作原理是一回事,而成功实现它则完全是另一回事。正如一位财富 100 强公司的技术负责人告诉我们的那样:“我们原以为给人工智能添加内存就像升级电脑内存一样。但实际上,这就像教孩子如何从经验中学习——复杂、微妙,但如果做得好,却无比强大。”

Understanding how memory works in AI is one thing—implementing it successfully is another challenge entirely. As one technology leader at a Fortune 100 company told us, “We thought adding memory to our AI would be like upgrading computer RAM. Instead, it was like teaching a child how to learn from experience—complex, nuanced, but incredibly powerful when done right.”

接下来的实施案例将揭示其中的陷阱和行之有效的成功路径。您将了解到,为什么有些组织能够利用记忆型人工智能取得变革性成果,而另一些组织却难以从中获益。更重要的是,您将学习如何采取切实可行的步骤,确保您的实施能够成功跨越这道鸿沟。

The implementation stories ahead reveal both the pitfalls and the proven paths to success. You’ll discover why some organizations achieve transformative results with memory-enabled AI while others struggle to see any benefit. More importantly, you’ll learn the practical steps to ensure your implementation lands on the right side of this divide.

通过与数百家机构的合作,我们总结出了这些经验教训,并将其提炼成一个清晰的成功框架。无论您是刚刚开始探索人工智能记忆,还是希望增强现有的人工智能记忆系统,这套框架都将对您有所帮助。

Through our work with hundreds of organizations, we’ve distilled these lessons into a clear framework for success. Whether you’re just starting your journey with AI memory or looking to enhance existing

代理内存管理最佳实践

Best Practices in Managing Memory for Agents

引导团队正确使用代理的记忆力

Leading Teams to Use Agent’s Memory the Right Way

我们的经验表明,在人工智能代理中实现记忆功能仅仅是开始。要充分发挥其潜力,至关重要的是采取相应的实践,确保这项功能能够带来可衡量的收益,同时避免效率低下或风险。如同任何强大的工具一样,记忆的真正价值取决于它的使用方式。多年来,我们开发了一系列策略,帮助企业有效利用人工智能记忆功能,最大限度地减少潜在风险,并维护用户信任。

Our experience has shown that implementing memory in AI agents is only the starting point. To unlock its full potential, it’s critical to adopt practices that ensure this capability delivers measurable benefits while avoiding inefficiencies or risks. Like any powerful tool, the true value of memory depends on how it’s used. Over the years, we’ve developed strategies that allow businesses to leverage AI memory effectively, minimize pitfalls, and maintain trust.

首先,相关性至关重要。人工智能记忆系统的有效性取决于其所获得的指导。用户在交互过程中强调关键信息,从而积极参与塑造人工智能的记忆内容。这可能包括重复要点、明确指出其重要性,或使用结构化协议将某些信息标记为“重要”或“可存档”。例如,在项目管理中,用户可以突出显示对长期跟踪至关重要的里程碑、决策和障碍。然而,人工智能的目标并非记住所有内容,而是保留那些能够带来更好结果的信息。通过过滤掉无关紧要的细节来避免记忆过载,可以确保系统保持专注和高效。

First, relevance is key. AI memory systems are only as effective as the guidance they receive. Users play an active role in shaping what the AI retains by emphasizing critical information during interactions. This might involve repeating key points, explicitly stating their importance, or using structured protocols to tag certain information as “important” or “archivable.” For instance, in project management, users might highlight milestones, decisions, and obstacles that are vital for long-term tracking. However, the goal isn’t for the AI to remember everything—it’s to retain the information that drives better outcomes. Avoiding memory overload by filtering out inconsequential details ensures that the system remains focused and efficient.

另一个最佳实践是利用摘要功能来回顾上下文。具备记忆功能的AI代理在能够无缝回忆过往互动时表现出色,但用户可能并不总是记得AI所掌握的信息。摘要功能弥合了这一差距,使用户能够与AI的记忆保持一致。询问总结——例如“我们上次会议的要点是什么?”或“你能总结一下我这个项目的优先事项吗?”——可以确保沟通的连贯性和一致性。摘要就像检查点一样,帮助用户验证并根据需要纠正AI的理解。在团队环境中,这方面的作用更加显著。想象一下,一个营销团队使用AI助手来管理营销活动。在每次周会之前,AI可以提供正在进行的工作、绩效指标以及过往营销活动经验的总结,从而节省时间并确保每个人都步调一致。

Another best practice is leveraging summarization to refresh context. AI agents equipped with memory excel when they can seamlessly recall past interactions, but users may not always remember what the AI knows. Summarization bridges this gap, allowing users to align with the AI’s memory. Asking for a recap—such as “What are the key takeaways from our last session?” or “Can you summarize my priorities for this project?”—ensures continuity and alignment. Summaries act as checkpoints, helping users validate and correct the AI’s understanding as needed. In team settings, this becomes even more powerful. Imagine a marketing team using an AI assistant to manage campaigns. Before every weekly meeting, the AI could provide a summary of ongoing efforts, performance metrics, and lessons from past campaigns, saving time and keeping everyone on the same page.

记忆审计是有效记忆管理的另一个关键方面。正如人类会定期反思以理清思路一样,人工智能系统也能从定期的记忆审查中获益。这些审计有助于识别无关、过时或错误的信息,并允许用户优化人工智能保留的内容。例如,客户服务人工智能可能仍然保留着客户的旧地址或过时的购买习惯。通过审查和清理人工智能的内存,企业可以确保其准确性和可信度。结构化的审计流程,例如定期与团队进行内存审查,可以优化人工智能的优先级,并使其内存与业务目标保持一致。

Memory audits are another critical aspect of effective memory management. Just as humans periodically reflect to clarify thoughts, AI systems benefit from regular reviews of their memory. These audits help identify irrelevant, outdated, or incorrect information and allow users to refine what the AI retains. For example, a customer service AI might still hold on to a customer’s old address or obsolete purchasing habits. By reviewing and cleaning the AI’s memory, businesses can ensure accuracy and trustworthiness. Structured audit protocols, such as scheduled memory reviews with teams, can refine the AI’s priorities and align its memory with business goals.

平衡记忆深度和隐私或许是人工智能记忆管理中最棘手的问题。记忆固然能够实现深度个性化和情境理解,但也需要用户信任人工智能并向其提供敏感数据。透明度是建立这种信任的必要条件。企业必须清晰地说明记忆管理方式,并赋予用户对其数据的控制权。这包括允许用户查看人工智能记忆的内容、编辑或删除特定记忆,以及设置人工智能可以保留的数据范围等功能。优先考虑安全的存储实践,例如对敏感信息进行隔离和实施严格的访问控制,至关重要。

Balancing memory depth and privacy is perhaps the most sensitive aspect of managing AI memory. While memory enables deep personalization and contextual understanding, it also requires users to trust the AI with sensitive data. Transparency is non-negotiable in fostering this trust. Businesses must clearly communicate how memory is managed and offer users control over their data. This includes features that allow users to view what the AI remembers, edit or delete specific memories, and set boundaries for what the AI can retain. Prioritizing secure storage practices, such as compartmentalizing sensitive information and applying strict access controls, is essential.

根据我们的经验,人工智能系统成功实现记忆功能需要采取积极主动的方法,将相关性、摘要、审计、反馈和隐私保护相结合。通过采纳这些最佳实践,企业可以将人工智能记忆转化为强大的资产,从而提高效率、建立信任并带来有意义的成果。这并非实施的终点,而是一个持续改进、调整和合乎伦理的管理过程,以确保人工智能系统能够随着组织的需求而不断发展。

From our experience, the successful implementation of memory in AI systems requires a proactive approach that combines relevance, summarization, auditing, feedback, and privacy. By adopting these best practices, businesses can transform AI memory into a powerful asset that enhances efficiency, builds trust, and drives meaningful outcomes. The journey doesn’t end with implementation—it’s an ongoing process of refinement, alignment, and ethical management to ensure that the AI system evolves alongside the needs of the organization.

解决隐私问题并确保透明度

Addressing Privacy Concerns and Ensuring Transparency

在人工智能记忆系统中,隐私和伦理问题的管理至关重要,因为长期记忆的保持涉及处理敏感的用户数据。人工智能代理,尤其是那些具备记忆能力的代理,可以收集和存储诸如个人偏好、交互历史甚至行为模式等信息。这些功能虽然能够实现个性化并改善用户体验,但并非万能。这些经验也给数据隐私和遵守《通用数据保护条例》(GDPR) 或《加州消费者隐私法》(CCPA) 等法规带来了重大风险。

Managing privacy and ethical issues in AI memory systems is critical, as long-term memory retention involves handling sensitive user data. AI agents, especially those with memory capabilities, can collect and store information such as personal preferences, interaction histories, or even behavioral patterns. While these features enable personalization and improve user experiences, they also introduce significant risks to data privacy and compliance with regulations like the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA).

管理隐私的第一步是将数据收集限制在必要范围内。领导者应采用“隐私设计”方法,从一开始就将数据隐私作为核心考量。例如,在将数据长期存储之前,实施匿名化或假名化机制。通过移除直接标识​​符,即使发生数据泄露,也能保护敏感信息。此外,还应采用数据最小化等技术,限制存储不必要或过于详细的数据,并定期进行审计,以识别和删除过时或无关的信息。

The first step in managing privacy is minimizing data collection to only what is essential. Leaders should adopt a privacy-by-design approach, ensuring data privacy is a core consideration from the start. For example, implement mechanisms that anonymize or pseudonymize data before it’s stored in long-term memory. By removing direct identifiers, even in the event of a breach, sensitive information is protected. Additionally, use techniques like data minimization, which restricts the storage of unnecessary or overly detailed data, and regular audits to identify and remove outdated or irrelevant information.

遵守 GDPR 和 CCPA 等法规是另一个至关重要的方面。这些法律赋予用户对其数据的权利,例如访问、删除或限制其信息的使用方式。人工智能记忆系统必须尊重这些权利。例如,如果用户要求人工智能系统“忘记”特定信息,该系统应具备从数据库中删除相应数据的机制,包括备份和长期存储。此外,公司必须确保透明度,清晰地告知用户正在收集哪些数据、如何使用这些数据以及数据将保留多长时间。

Compliance with regulations such as GDPR and CCPA is another essential aspect. These laws give users rights over their data, such as the ability to access, delete, or restrict how their information is used. AI memory systems must be designed to respect these rights. For instance, if a user asks an AI system to “forget” specific information, the system should have mechanisms to erase the corresponding data from its databases, including backups and long-term storage. Moreover, companies must ensure transparency, clearly communicating to users what data is being collected, how it will be used, and how long it will be retained.

从实际角度来看,对静态数据和传输中的数据进行强大的加密是必不可少的。领导者应投资于安全的数据库技术,并遵循数据存储的行业最佳实践。内存系统的访问权限应仅限于授权人员或流程,并维护活动日志以跟踪谁在何时访问了哪些数据。实施这些控制措施可以降低未经授权的访问或数据泄露的风险,从而避免严重损害用户信任并导致监管处罚。

From a practical standpoint, robust encryption for both data at rest and in transit is non-negotiable. Leaders should invest in secure database technologies and follow industry best practices for data storage. Access to memory systems should be restricted to authorized personnel or processes, with activity logs maintained to track who accessed what data and when. Implementing these controls reduces the likelihood of unauthorized access or breaches, which can severely damage user trust and lead to regulatory penalties.

伦理问题不仅限于隐私,还包括偏见和数据滥用等问题。具备记忆功能的AI代理如果过度依赖反映过时或片面观点的历史数据,可能会强化偏见。领导者必须确保其AI系统接受持续的偏见测试和验证。此外,组织应制定明确的政策,规范AI记忆系统的伦理使用,包括界定数据保留和个性化的可接受范围,以防止侵入性或操纵性行为。

Ethical issues go beyond privacy to include concerns like bias and misuse of data. AI agents with memory capabilities might reinforce biases if they overly rely on historical data that reflects outdated or skewed perspectives. Leaders must ensure that their AI systems undergo continuous bias testing and validation. Additionally, organizations should establish clear policies around the ethical use of AI memory systems, including defining acceptable boundaries for data retention and personalization to prevent intrusive or manipulative practices.

总之,管理人工智能内存中的隐私和伦理问题不仅仅是合规,更重要的是建立信任和促进负责任的人工智能使用。通过优先考虑安全的数据处理、遵守法规并积极解决伦理问题,领导者可以确保其人工智能系统既有效又尊重用户的权利和期望。

In summary, managing privacy and ethical issues in AI memory is not just about compliance; it’s about building trust and fostering responsible AI use. By prioritizing secure data handling, aligning with regulations, and addressing ethical concerns proactively, leaders can ensure their AI systems are both effective and respectful of user rights and expectations.

平衡创新与隐私:医疗服务提供商的历程

Balancing Innovation and Privacy: A Healthcare Provider’s Journey

当一家领先的美国医疗保健提供商决定在 2024 年实施具有记忆功能的 AI 以进行患者护理协调时,他们面临着一个看似不可能的挑战:如何创建一个 AI 系统,该系统能够从患者互动中学习,同时保持严格的 HIPAA 合规性和患者隐私。

When a leading U.S. healthcare provider decided to implement memory-enabled AI for patient care coordination in 2024, they faced a seemingly impossible challenge: how to create an AI system that could learn from patient interactions while maintaining strict HIPAA compliance and patient privacy.

“风险再高不过了,”他们的首席隐私官告诉我们。“我们需要人工智能记住并从与患者的互动中学习,以便提供更好的医疗服务,但一次隐私泄露就可能造成毁灭性的后果。”

“The stakes couldn’t have been higher,” their Chief Privacy Officer tells us. “We needed our AI to remember and learn from patient interactions to provide better care, but one privacy breach could have devastating consequences.”

他们的解决方案结合了创新技术和严谨的管理。他们实施了一种多层内存架构,将患者身份信息与行为模式和临床见解严格分离。人工智能无需访问单个患者的详细信息,即可从聚合模式中学习,他们称之为“隐私保护学习模式”。

Their solution combined innovative technology with careful governance. They implemented a multi-layered memory architecture where patient-identifying information was strictly separated from behavioral patterns and clinical insights. The AI could learn from aggregated patterns without accessing individual patient details, using what they call “privacy-preserving learning patterns.”

技术上的实现仅仅是开始。他们建立了一个清晰的治理框架,其中包括:

The technical implementation was just the beginning. They established a clear governance framework that included:

对人工智能的内存系统进行定期隐私审计

Regular privacy audits of the AI’s memory systems

自动检测潜在隐私风险

Automated detection of potential privacy risks

明确的数据保留和删除协议

Clear protocols for data retention and deletion

患者可通过透明门户控制自己的数据

Patient control over their data through a transparent portal

他们的首席技术官反思道:“最让我们感到惊讶的是,这种以隐私为先的方法实际上提高了人工智能的有效性。通过关注模式而不是个别细节,该系统能够形成更强大、更具普适性的洞察。”

“What surprised us most,” their CTO reflects, “was how this privacy-first approach actually improved the AI’s effectiveness. By focusing on patterns rather than individual details, the system developed more robust and generalizable insights.”

实施一年后,结果显而易见:护理协调效率提升了40%,且未发生任何隐私泄露事件。更重要的是,通过定期调查测得的患者对人工智能系统的信任度高达87%,甚至高于他们对某些传统医院系统的信任度。

One year after implementation, the results spoke for themselves: a 40% improvement in care coordination efficiency, with zero privacy breaches. More importantly, patient trust in the AI system, measured through regular surveys, reached 87%—higher than their trust in some traditional hospital systems.

技术实施建议

Recommendations for Technical Implementation

要成功实施这些层级,必须结合合适的工具、策略和技术——每一种都有其独特的技术要求和挑战。基于我们的经验,我们编写了这份实用指南,旨在帮助您和您的技术团队应对这些复杂情况,其中包含所需的关键要素、我们观察到的常见陷阱以及相应的解决方案。

To succeed in implementing these layers, it’s essential to combine the right tools, strategies, and technologies—each with its unique technical requirements and challenges. Based on our experience, here’s a practical guide to help you and your technical teams navigate these complexities, complete with the enablers needed, common pitfalls we’ve observed, and solutions to overcome them.

图像

表 7.1:实现三层内存的建议(来源:© Bornet 等人)

Table 7.1: Recommendations for implementing the three layers of memory (Source: © Bornet et al.)

人工智能内存框架中的每一层都建立在前一层的基础上,依靠数据库和内存功能之间的无缝集成才能高效运行。常见的陷阱,例如系统调优不佳或数据结构不完整,会导致内存检索效率低下或性能不稳定。缓解这些问题需要维护干净、组织良好的数据,为内存使用设定明确的优先级,并持续监控和改进系统。应对这些挑战可以确保人工智能保持高效、可扩展,并能适应实际应用。

Each layer in the AI memory framework builds upon the previous one, relying on seamless integration between databases and memory functions to operate effectively. Common pitfalls, such as poorly tuned systems or unstructured data, can lead to inefficiencies in memory retrieval or unreliable performance. Mitigation requires maintaining clean, well-organized data, setting clear priorities for memory use, and continuously monitoring and refining the system. Addressing these challenges ensures that AI remains efficient, scalable, and adaptable to real-world applications.

数据管理是成功的基石

Data Management as the Foundation of Success

数据管理是高效人工智能系统的基石,尤其对于那些具备记忆功能的系统而言更是如此。正如企业依靠井然有序的记录才能高效运营一样,人工智能代理也依赖于无缝的数据存储、检索和更新才能发挥最佳性能。如果底层数据基础设施管理不善,即使是最智能的人工智能也可能出现故障——检索到过时的信息、遗漏关键细节或输出结果不一致。对于企业而言,这可能意味着错失良机、客户不满以及信任度下降。

Data management is the backbone of effective AI systems, especially those equipped with memory capabilities. Just like a business relies on well-organized records to operate efficiently, an AI agent depends on seamless data storage, retrieval, and updating to perform at its best. If the underlying data infrastructure is poorly managed, even the smartest AI can stumble—retrieving outdated information, missing critical details, or delivering inconsistent outputs. For businesses, this can mean missed opportunities, frustrated customers, and diminished trust.

为了避免这些陷阱,领导者必须将数据管理视为战略重点。这首先要从投资可扩展且高效的数据库技术入手,以满足现代人工智能系统日益增长的需求。试想一下,一个人工智能客服代理需要在几秒钟内访问成千上万的客户档案、购买记录和过往互动信息。如果没有强大的基础设施,代理可能会响应迟缓或提供不完整的回复,从而损害用户所期望的流畅体验。云数据库、实时分析工具和用于语义搜索的向量存储等技术是使人工智能记忆系统敏捷高效的关键推动因素。

To avoid these pitfalls, leaders must view data management as a strategic priority. This starts with investing in scalable and efficient database technologies that can handle the growing demands of modern AI systems. Imagine an AI-powered customer service agent that needs to access thousands of customer profiles, purchase histories, and past interactions in seconds. Without a robust infrastructure, the agent could lag or deliver incomplete responses, eroding the seamless experience users expect. Technologies like cloud-based databases, real-time analytics tools, and vector storage for semantic searches are key enablers for making AI memory systems agile and effective.

数据库与人工智能系统之间的无缝集成是另一个关键环节。人工智能记忆系统依赖于与数据层的流畅、实时交互来检索最相关、最新的信息。这不仅仅关乎技术兼容性,更关乎如何设计工作流程,使数据能够在存储、处理和人工智能代理之间顺畅流动。例如,在零售环境中,用于推荐产品的人工智能系统需要即时整合最新的库存更新或顾客浏览行为,才能确保其推荐的相关性。如果没有这种动态连接,即使是最好的人工智能模型也会显得与时代脱节。

Seamless integration between databases and AI systems is another critical piece of the puzzle. AI memory systems rely on smooth, real-time interactions with the data layer to retrieve the most relevant and up-to-date information. This isn’t just about technical compatibility—it’s about designing workflows where data flows effortlessly between storage, processing, and the AI agent. For example, in a retail setting, an AI system recommending products needs to instantly incorporate the latest inventory updates or customer browsing behavior to make its suggestions relevant. Without this dynamic connectivity, even the best AI models can appear out of touch.

数据管理也需要持续关注,以确保数据的质量和准确性。糟糕的数据会导致错误的决策,无论是对人类还是人工智能而言。领导者必须建立流程,定期清理、更新和验证数据,确保人工智能学习和检索的信息既可靠又最新。此外,随着系统规模的扩大,领导者应考虑使用能够自动化这些流程的工具,以减少人为错误并简化操作。

Data management also requires vigilance to ensure quality and accuracy. Poor data leads to poor decisions, both for humans and AI. Leaders must establish processes to clean, update, and validate data regularly, ensuring that what the AI learns and retrieves is both reliable and current. Additionally, as systems scale, leaders should consider tools that automate these processes, reducing manual errors and streamlining operations.

从本质上讲,数据管理是人工智能创造价值的幕后驱动力。它不仅仅关乎拥有数据,更关乎如何组织、访问和更新数据,从而充分释放人工智能的潜力。明智地投资于数据基础设施的企业领导者,不仅能够确保系统面向未来,还能帮助企业在日益人工智能驱动的世界中蓬勃发展。

In essence, data management is the hidden force that powers AI’s ability to deliver value. It’s not just about having data; it’s about organizing, accessing, and updating it in ways that unlock the full potential of AI. Business leaders who invest wisely in their data infrastructure not only future-proof their systems but also position their companies to thrive in an increasingly AI-driven world.

记忆革命才刚刚开始。

The Memory Revolution Has Just Started

在本章中,我们探讨了记忆如何将人工智能从一个复杂的计算器转变为一个真正的思考伙伴。我们看到,短期记忆如何促成连贯的对话,而结构化的记忆则如何组织经验以进行长期学习。或许最为关键的是,我们理解了反馈回路如何驱动持续改进,使人工智能系统能够从每一次交互中学习和适应。

Throughout this chapter, we’ve explored how memory transforms AI from a sophisticated calculator into a genuine thinking partner. We’ve seen how short-term memory enables coherent conversations, while structured retention organizes experiences for long-term learning. Perhaps most crucially, we’ve understood how feedback loops drive continuous improvement, allowing AI systems to learn and adapt from every interaction.

这对企业领导者而言意义深远且迫在眉睫。首先,具备记忆功能的AI代理从根本上改变了客户互动的经济模式。当AI代理能够在对话中保持上下文关联、记住客户偏好并从过往互动中学习时,它们就能以极低的成本提供指数级提升的客户体验。我们已经看到,一些公司在降低客户服务成本40%至50%的同时,客户满意度也得到了显著提高。

The implications for business leaders are profound and immediate. First, memory-enabled AI agents fundamentally change the economics of customer interaction. When AI agents can maintain context across conversations, remember customer preferences, and learn from past interactions, they deliver exponentially better experiences at a fraction of the cost. We’ve seen this in practice with companies achieving 40-50% reductions in customer service costs while simultaneously improving satisfaction scores.

其次,记忆改变了决策过程。领导者不再仅仅依赖当前数据,而是可以利用人工智能代理,这些代理能够记住并学习组织内所有过去的决策、成功和失败。这种以往分散在电子邮件、文档和员工记忆中的机构记忆,如今变成了一种结构化、易于访问的资源,可以为战略和运营提供信息。

Second, memory transforms decision-making processes. Rather than relying solely on current data, leaders can now tap into AI agents that remember and learn from every past decision, success, and failure across the organization. This institutional memory, previously scattered across emails, documents, and employees’ minds, becomes a structured, accessible resource for informing strategy and operations.

第三,具备记忆功能的智能体正在重塑组织的学习和适应方式。当人工智能智能体能够记忆并分析数千个项目或数百万次客户互动中的模式时,它们就能识别出人类独自无法​​发现的机遇和风险。这并非取代人类的判断,而是以前所未有的模式识别和历史感知能力来增强人类的判断。

Third, memory-enabled agents reshape how organizations learn and adapt. When AI agents can remember and analyze patterns across thousands of projects or millions of customer interactions, they identify opportunities and risks that would be impossible for humans to spot alone. This isn’t replacing human judgment—it’s augmenting it with a level of pattern recognition and historical awareness previously unimaginable.

本章即将结束,请记住,人工智能记忆技术的发展不仅仅代表着技术进步,它更标志着我们与机器互动方式以及机器如何帮助我们思考世界的根本性转变。企业领导者面临的问题不是是否接受这种变革,而是如何塑造这种变革,使其既能创造价值,又能尊重人类的自主性和创造力。

As we close this chapter, remember that the development of memory in AI represents more than just technological progress—it marks a fundamental shift in how we interact with machines and how they help us think about the world. The question facing business leaders isn’t whether to embrace this transformation but how to shape it in ways that create value while respecting human agency and creativity.

第三部分

PART 3

利用人工智能代理进行创业和职业发展

ENTREPRENEURSHIP AND PROFESSIONAL GROWTH WITH AI AGENTS

 

 

N既然我们已经探讨了行动、推理和记忆这三大基础要素,现在是时候进行实践操作了。我们究竟该如何构建这些系统?更重要的是,如何利用它们创造真正的价值?

Now that we’ve explored the foundational keystones of Action, Reasoning, and Memory, it’s time to turn to hands-on practice. How do we actually build these systems? And, more importantly, how can you leverage them to create real value?

在本书的前几部分中,我们带领读者从理解人工智能代理的概念,逐步探索它们的思考、行动和学习方式。我们看到,它们代表着与传统人工智能系统截然不同的根本性转变——它们不仅能够处理信息,还能代表我们自主地追求目标。然而,仅仅了解这项技术还不够。真正的问题是:如何驾驭它?

In the previous Parts of the book, we’ve taken you on a journey from understanding what AI agents are to discovering how they think, act, and learn. We’ve seen how they represent a fundamental shift from traditional AI systems—not just processing information but autonomously pursuing goals on our behalf. But knowing about this technology isn’t enough. The real question is: how can you harness it?

接下来的章节将为你提供所需的工具,将人工智能代理的变革潜力转化为切实可行的现实。未来属于那些不仅能够理解这项技术,而且能够有效运用它的人——而这正是我们即将向你展示的。

The coming chapters will equip you with the tools to turn the transformative potential of AI agents into tangible reality. The future belongs to those who can not only understand this technology but effectively implement it—and that’s exactly what we’re about to show you how to do.

在这里,我们将撸起袖子,脚踏实地。无论您是想变革您的组织,还是打造下一个百万美元级企业,这些章节都将为您提供从构思到实施的路线图。

Here, we roll up our sleeves and get practical. Whether you’re looking to transform your organization or launch the next million-dollar business, these chapters provide your roadmap from idea to implementation.

让我们开始从构思到实施的旅程。

Let’s begin our journey from ideas to implementation.

第八章

CHAPTER 8

构建成功人工智能代理的实用指南

A PRACTICAL GUIDE FOR BUILDING SUCCESSFUL AI AGENTS

“O“我们的电子报简直要了我们的命,”我们记得在一个周五深夜,一边埋头苦读几十篇文章,一边琢磨着如何写出能吸引读者的摘要,心里想着,“肯定有更好的办法。”一个月后,我们的代理系统接管了整个流程,订阅用户在短短一个月内就激增至30万,团队每周也腾出了40个小时用于创意工作。最棒的是什么?我们创作的内容质量比以往任何时候都好。

“Our newsletters are killing us,” we remember thinking one late Friday night, poring over dozens of articles, trying to craft summaries that would engage our readers. “There has to be a better way.” Fast forward one month: our agentic system was handling the entire process, our subscriber base had exploded to 300,000 in just one month, and our team had reclaimed 40 hours a week for creative work. The best part? We were producing better content than ever before.

但这个故事与你息息相关:构建高效的人工智能代理并非取决于拥有最雄厚的预算或最先进的技术,而是取决于理解几个区分成功与失败的关键原则。在本章中,我们将通过自身在各行业实施人工智能代理的尝试、错误和突破,分享这些原则。

But here’s what makes this story relevant to you: building effective AI agents isn’t about having the biggest budget or the most advanced technology. It’s about understanding a few key principles that separate success from failure. In this chapter, we’ll share these principles through our own trials, errors, and breakthroughs in implementing AI agents across industries.

我们不仅会告诉你该怎么做,还会向你展示。通过真实的案例、实用的工具以及对实际操作过程的坦诚描述。错误(以及我们是如何改正的),你将获得构建人工智能代理的蓝图,从而改变你和你的组织的工作方式。

We won’t just tell you what to do—we’ll show you. Through real examples, practical tools, and honest accounts of what went wrong (and how we fixed it), you’ll gain a blueprint for building AI agents that transform how you and your organization work.

第一步:寻找合适的代理机会

Step 1: Finding the Right Agentic Opportunities

想象一下,你身处一家快速发展的数字营销机构熙熙攘攘的办公室。创始人兼创意总监珍妮坐在办公桌前,周围环绕着多个屏幕。她正忙着在各种应用程序之间切换——调取社交媒体分析数据、查看营销活动效果、整理内容日历,并试图将所有信息汇总成客户报告。

Picture yourself in the bustling office of a fast-growing digital marketing agency. Jenny, the founder and creative director, sits at her desk surrounded by multiple screens. She’s frantically switching between applications—pulling social media analytics, checking campaign performances, organizing content calendars, and trying to compile everything into client reports.

珍妮向我们提出了一个有趣的挑战:“我的团队被日常琐事淹没了,”她解释说。“我们有很多才华横溢的创意人员,他们花费大量时间进行数据录入和报告生成,而不是专注于战略和创新。但我如何才能知道哪些任务真正适合交给人工智能代理呢?”

Jenny approached us with an intriguing challenge: “My team is drowning in routine tasks,” she explained. “We have brilliant creatives spending hours on data entry and report generation instead of strategy and innovation. But how do I know which tasks are really right for AI agents?”

这个问题——即如何着手开发人工智能代理——至关重要。我们已经了解到,成功往往更多地取决于选择正确的机遇,而非技术上的精湛程度。

This question—knowing where to start with AI agents—is crucial. We’ve learned that success often depends more on choosing the right opportunities than on technical sophistication.

首先,我们必须明确一个基本事实:人工智能代理并非万能灵药,无法解决所有问题。正如你不会用锤子来解决所有房屋维修问题一样,并非所有商业挑战都需要人工智能代理。事实上,我们最常见的误区之一就是企业家和企业高管在没有事先确定人工智能代理是否是合适工具的情况下,就急于部署它们。

Let’s start with a fundamental truth: AI agents aren’t magical solutions that can handle any task. Just as you wouldn’t use a hammer to fix every home repair problem, not every business challenge calls for an AI agent. In fact, one of the most common pitfalls we see is entrepreneurs and business executives rushing to implement agents without first determining if they’re the right tool for the job.

何时不应使用人工智能代理

When Not to Use AI Agents

首先,让我们明确哪些地方不适合部署人工智能代理。根据我们的经验,我们已经发现了一些危险信号。

Let us start by recognizing where not to deploy AI agents. Through our experience, we have identified several red flags.

首先,那些需要真正人类创造力或情商的任务通常应该继续由人来完成。在营销机构中,人工智能代理可以负责数据收集。虽然三级代理具备基本的报告能力,但创意营销策划和客户关系管理仍然牢牢掌握在人手中。尽管三级代理能够进行自然语言互动,但他们无法真正捕捉到引人入胜的营销活动所需的情感共鸣。

First, tasks that require genuine human creativity or emotional intelligence should generally remain human-driven. At the marketing agency, AI agents could handle data gathering and basic reporting, but creative campaign ideation and client relationship management remained firmly in human hands. While Level 3 agents can engage in natural language interactions, they cannot truly capture the emotional resonance needed for compelling marketing campaigns.

同样,需要理解更广泛的市场背景或基于不完整信息做出判断的战略决策,也应该由人类来完成。即使是第三级人工智能,也缺乏应对这些场景所需的复杂推理能力和市场直觉。

Similarly, strategic decision-making that requires understanding the broader market context or making judgment calls based on incomplete information should stay with humans. Even at Level 3, AI agents lack the sophisticated reasoning and market intuition necessary for these scenarios.

有些任务对人工智能代理来说过于复杂,难以有效处理。一家科技公司曾委托我们开发一个代理来管理其整个客户支持运营。虽然潜在影响巨大,但整个过程涉及太多独特的场景和情感互动。人工智能代理只有在其能力与所分配任务的复杂程度相匹配时才能发挥最佳性能。

Some tasks are simply too complex for AI agents to handle effectively. A technology company once asked us to build an agent to manage their entire customer support operation. While the potential impact was significant, the process involved too many unique scenarios and emotional interactions. AI agents perform best when their capabilities align with the complexity of the task they are assigned.

在其他情况下,人工智能代理可能缺乏做出关键决策的权限。一家金融服务公司希望人工智能代理自主做出投资决策。这不仅风险极高,而且明显违反了监管要求。因此,必须考虑人工智能代理是否拥有执行其所分配任务所需的适当权限。

In other cases, AI agents may lack the authority to make critical decisions. A financial services firm wanted an AI agent to make investment decisions autonomously. This was not only risky but also a clear violation of regulatory requirements. It is essential to consider whether an AI agent has the appropriate level of authority for the task it is assigned.

通过了解这些局限性,组织可以确保在人工智能代理能够创造价值的地方部署人工智能代理,同时在最重要的地方保持人工监督。

By understanding these limitations, organizations can ensure they deploy AI agents where they add value while keeping human oversight where it matters most.

能动机会的三重循环

The Three Circles of Agentic Opportunity

我们开发了一种简单而强大的方法,称为“智能体机遇三环”,旨在帮助您找到实施智能体人工智能的最佳切入点。想象一下三个重叠的圆圈。您的人工智能代理的最佳应用场景就位于这三个圆圈的交点处。让我们结合我们营销机构的经验来详细解读一下。

We’ve developed a straightforward but powerful approach we call “The Three Circles of Agentic Opportunity” to help identify the perfect sweet spots to implement with agentic AI. Picture three overlapping circles. The sweet spot for your AI agents lies where these circles intersect. Let’s break this down through our marketing agency’s experience.

图像

图 8.1:代理机会的最佳时机(来源:© Bornet 等人)

Figure 8.1: The Sweet Spot for Agentic Opportunities (Source: © Bornet et al.)

第一圈:高影响力——它重要吗?

Circle 1: High Impact - Will It Matter?

第一个圆圈代表那些如果实现自动化将会对贵组织产生重大影响的任务。考量很简单:如果将这个流程自动化,它是否会带来实质性的改变?影响不仅仅在于节省时间,更在于节省下来的时间能够让贵组织做哪些事情。

The first circle represents tasks that, if automated, would significantly impact your organization. The consideration is simple: if you automate this process, will it make a meaningful difference? Impact isn’t just about saving time—it’s about what that saved time enables your organization to do.

想想那些耗费您专业人才时间的日常琐事。也许您的销售团队花费大量时间更新客户关系管理系统(CRM)记录,而不是与客户建立联系。也许您的研究人员花费更多时间整理数据,而不是进行数据分析。又或许您的人力资源团队疲于处理日常申请,而无暇顾及员工发展。

Consider the routine tasks that consume your skilled professionals’ time. Perhaps your sales team spends hours updating CRM records instead of building relationships with clients. Maybe your researchers spend more time formatting data than analyzing it. Or your HR team might be buried in processing routine requests rather than focusing on employee development.

最具影响力的机会往往并非最复杂的流程。相反,应该寻找那些阻碍优秀员工发挥最佳水平的日常琐事。评估影响时,请问:如果这项任务明天就实现自动化,你的团队可以完成哪些工作?

The highest-impact opportunities often aren’t your most complex processes. Instead, look for the routine tasks that are preventing your best people from doing their best work. When evaluating impact, ask: If this task were automated tomorrow, what would your team be able to accomplish instead?

对于这家营销机构来说,每月的客户报告流程要耗费团队超过200个小时的时间。更重要的是,这些例行工作使得分析师无法进行客户真正需要的战略思考。

For the marketing agency, their monthly client reporting process was consuming over 200 hours of team time per month. More importantly, this routine work was preventing their analysts from doing the strategic thinking their clients really needed.

第二圈:可行性——能做到吗?

Circle 2: Feasibility - Can It Be Done?

第二个方面是当前的AI代理技术是否能够有效且安全地完成任务。可行性就像烹饪前检查食材是否齐全一样——你需要确保所有必需要素都具备才能成功。

The second circle is about whether current AI agent technology can actually handle the task effectively and safely. Think of feasibility like checking if you have the right ingredients before starting to cook—you need to ensure you have all the essential elements for success.

最适合自动化的流程通常具有以下特点:

The most feasible processes for automation typically have these characteristics:

清晰、一致的决策规则

Clear, consistent rules for making decisions

可访问的数据和系统

Accessible data and systems

可定义的成功标准

Definable success criteria

如果出现问题,后果可控。

Manageable consequences if something goes wrong

能够在结果影响运营之前进行验证。

Ability to verify results before they impact operations

关键在于寻找那些能够解释规则,且无需频繁使用“视情况而定”或“但有时……”之类的短语的流程。流程中例外情况和判断越多,就越不适合你的第一个人工智能代理项目。

The key is to look for processes where you can explain the rules without using phrases like “it depends” or “but sometimes...” too often. The more exceptions and judgment calls a process requires, the less suitable it is for your first AI agent project.

对于这家营销机构而言,报告流程具备自动化的所有关键要素,因此是可行的。首先,他们所需的数据可以通过API访问。此外,该流程遵循一致的规则,结果易于核查,而错误可以由人工处理。

For the marketing agency, the reporting process was feasible for automation because it had all the key ingredients. First, the data they needed was accessible through APIs. In addition, the process followed consistent rules, and the outcome could be easily checked, while errors could be managed by people.

第三圈:努力——值得吗?

Circle 3: Effort - Is It Worth It?

最后一个环节会考虑实施的实际层面——所需的资源、时间和组织变革。这不仅仅关乎技术复杂性,更关乎您的组织是否做好了变革的准备。

The final circle considers the practical aspects of implementation—the resources, time, and organizational change required. This isn’t just about technical complexity; it’s about your organization’s readiness for change.

请考虑以下事项:

Consider whether:

该过程有详细记录。

The process is well-documented

您的团队已做好准备并愿意适应

Your team is ready and willing to adapt

你可以从小规模开始,然后逐步扩大规模。

You can start small and scale up

潜在收益显然证明了这项投资的合理性。

The potential benefits clearly justify the investment

您可以在不中断核心运营的情况下进行实施。

You can implement without disrupting core operations

最好的入门项目通常可以分阶段实施,让你逐步建立信心和能力。

The best first projects can often be implemented in phases, allowing you to build confidence and capabilities gradually.

对 Jenny 来说,报告自动化很有吸引力,因为他们已经有了书面记录的流程,可以从小规模开始逐步扩大,而且团队渴望改变。

In Jenny’s case, automating reports was attractive because they already had documented procedures, they could start small and scale up, and the team was eager for change.

找到你的最佳状态

Finding Your Sweet Spot

理想的AI代理项目存在于这三个圆圈的重叠区域。对于Jenny的公司来说,报告自动化非常理想,因为它能节省数百小时宝贵的分析师时间(影响),流程定义明确且自动化安全(可行性),而且他们可以在不影响核心业务的情况下实施(投入)。

The ideal AI agent projects live where these three circles overlap. For Jenny’s agency, report automation was perfect because it would free up hundreds of hours of valuable analyst time (Impact), the process was well-defined and safe to automate (Feasibility), and they could implement it without disrupting their core business (Effort).

首先,对照这三个圆圈,梳理一下你自身的流程。找出团队反复抱怨的任务——这些往往预示着具有重大影响的改进机会。然后,评估这些任务是否拥有清晰的规则和易于获取的数据。最后,考虑你是否具备应对变革所需的资源和准备。

Start by mapping your own processes against these three circles. Look for tasks that your team repeatedly complains about—these often signal high-impact opportunities. Then, evaluate whether these tasks have clear rules and accessible data. Finally, consider whether you have the resources and readiness to tackle the change.

记住,你的第一个人工智能代理项目应该像一次美好的初次约会——既要足够雄心勃勃令人兴奋,又不能过于复杂。这样做很可能会以失败告终。一开始就选择在所有三个方面都表现出色的项目,你就能积累成功经验,并为未来更雄心勃勃的项目奠定基础。

Remember, your first AI agent project should be like a good first date—ambitious enough to be exciting but not so complicated that it’s likely to end in disaster. Start with something that scores well across all three circles, and you’ll build both success and momentum for more ambitious projects to come.

实施现实检验:来自实地的经验教训

Implementation Reality Check: Lessons from the Field

我们的经验揭示了寻找合适机会的一些关键现实:

Our experience has revealed some crucial realities about finding the right opportunities:

首先,企业很少清楚自己真正需要哪些类型的经纪人。虽然他们经常会向我们提出具体的经纪人人选,但在近一半的情况下,这些并非最具价值的自动化机会。正因如此,我们的“三环框架”才显得至关重要——它能帮助企业突破固有思维,明确真正重要的因素。例如,当一位客户坚持要实现社交媒体发帖自动化时,我们的分析显示,自动化潜在客户筛选流程可以带来五倍的投资回报率。

First, companies rarely know which agents they actually need. While they often approach us with specific agent ideas, in nearly half the cases, these aren’t the most valuable automation opportunities. This is why our Three Circles framework is so critical—it helps cut through assumptions to identify what truly matters. When a client insisted on automating their social media posting, our analysis revealed that automating lead qualification would deliver five times the ROI.

其次,许多公司认为他们可以利用人工智能代理完全自动化所有工作岗位,但这完全是对人工智能工作原理的误解。人工智能代理并非员工——它们缺乏人类员工所具备的灵活性、适应性和判断力。

Second, many companies believe they can fully automate entire job roles with AI agents, but this is a fundamental misunderstanding of how AI works today. AI agents are not employees—they lack the flexibility, adaptability, or judgment that human workers bring to a role.

一名员工通常需要管理多个相互关联的任务,这些任务需要切换工作内容、做出决策和团队协作——而这些正是人工智能代理目前仍难以应对的挑战。与其思考“自动化角色”,不如思考“自动化任务”。

A single employee often manages multiple interconnected tasks that require context-switching, decision-making, and collaboration—things AI agents still struggle with. Instead of thinking in terms of “automating roles,” the right approach is to think in terms of “automating tasks.”

最后,完善的流程文档对于代理商的实施来说是宝贵的资源。在评估项目机会时,我们发现利用现有的标准操作程序 (SOP) 可以显著缩短实施时间并提高成功率。例如,对于一家金融服务客户,他们精心记录的客户入职流程使我们能够在类似项目中仅用一半的时间就部署好代理商,而如果从零开始绘制流程图,则所需时间将缩短一半。

Finally, well-documented processes are gold mines for agent implementation. When evaluating opportunities, we’ve found that leveraging existing documented standard operating procedures (SOPs) dramatically reduces implementation time and increases success rates. For one financial services client, their meticulously documented customer onboarding process allowed us to deploy an agent in half the time compared to a similar project where we had to map the process from scratch.

这些实地经验塑造了我们处理机会识别阶段的方式,强化了“三环”框架的重要性,同时增加了超越理论的实际考虑因素。

These field lessons have shaped how we approach the opportunity identification phase, reinforcing the importance of the Three Circles framework while adding practical considerations that go beyond theory.

实践练习:寻找你的代理机会

A Practical Exercise: Finding Your Agentic Opportunities

既然您已经了解了理论,接下来我们将通过一个简单的练习来了解我们与企业合作时常用的方法,以帮助他们找到最佳的智能化机会。我们之所以开发这种方法,是因为我们看到太多公司在没有进行充分分析的情况下就匆忙推行自动化,这往往导致资源浪费和团队沮丧。

Now that you understand the theory, let’s walk through a simple exercise we use with organizations to identify their best agentic opportunities. We developed this approach after seeing too many companies rush into automation without proper analysis, often leading to wasted resources and frustrated teams.

第一步:任务清单

Step 1: Task Inventory

首先,召集团队进行一次两小时的集中讨论。目标是列出一份详尽的、耗时过长或造成运营瓶颈的重复性任务清单。我们发现,提出具体问题比泛泛而谈更能取得成效。

Begin by gathering your team for a focused two-hour session. The goal is to create a comprehensive list of recurring tasks that consume significant time or create bottlenecks in your operations. We’ve found that asking specific questions yields better results than general brainstorming.

问问你的团队:

Ask your team:

“你发现自己在一周内反复做哪些任务?”

“What tasks do you find yourself doing repeatedly throughout the week?”

哪些活动妨碍您专注于更具战略性的工作?

“Which activities prevent you from focusing on more strategic work?”

“哪些流程总是会给我们的运营造成瓶颈?”

“What processes consistently create bottlenecks in our operations?”

“哪些日常任务需要最多的监督以防止出错?”

“Which routine tasks require the most oversight to prevent errors?”

当我们与这家营销机构一起进行这项练习时,他们的任务清单包括每月客户报告、社交媒体分析、营销活动效果跟踪、竞争对手监测和内容日历管理等。关键在于……不仅要记录任务,还要记录任务的频率以及每项任务投入的大致时间。

When we ran this exercise with the marketing agency, their list included tasks like monthly client reporting, social media analytics, campaign performance tracking, competitor monitoring, and content calendar management. The key was capturing not just the tasks, but also their frequency and the approximate time invested in each.

步骤二:影响评估

Step 2: Impact Assessment

接下来,评估每项任务自动化后的潜在影响。我们使用一个简单而有效的评分系统,该系统考虑了多个因素。对于每项任务,请根据以下标准,按 1-5 分的等级进行评分:

Next, evaluate each task’s potential impact if automated. We use a simple but effective scoring system that considers multiple factors. For each task, score the below criteria on a scale of 1-5:

1.节省时间(今天需要花费多少时间才能节省下来?)

1. Time Saved (How much time does it take today that can be saved?)

大多数情况下,只需估算平均节省时间即可。分数越高(5分),表示节省的时间就越长。

In most cases, just estimating the average time saved is enough. A higher score (5) means the time saved would be very high.

在某些情况下,更详细地计算当前成本(以美元计)会很有帮助。将每次任务所需小时数乘以频率和每小时成本(例如,每次任务 2 小时 × 每月 20 次任务 × 每小时 50 美元 = 节省 2,000 美元)。

In some cases, going into more detailed calculations of the current costs in dollars can be useful. Multiply hours per instance by frequency and cost per hour (e.g., 2 hours per task × 20 tasks/month × $50/hour = $2,000 savings)

2.节省时间的战略价值(机会成本是什么?)

2. Strategic Value of Freed Time (What’s the opportunity cost?)

战略价值衡量任务自动化带来的更广泛益处,包括其对员工满意度、客户体验、竞争地位和收入潜力的影响。

The strategic value measures the broader benefits of automating a task, including its impact on employee satisfaction, client experience, competitive positioning, and revenue potential.

评分等级为 1-5 分,分数越高表示对整体业务的影响越大。

It is rated on a scale of 1-5, with higher scores indicating greater overall business impact.

3.减少错误的可能性(错误发生的频率如何?)

3. Error Reduction Potential (How often do mistakes happen?)

大多数情况下,估算就足够了。分数越高(5分),意味着误差出现的频率越高或代价越大。

In most cases, an estimation is sufficient. A higher score (5) means the errors are highly frequent or costly.

在某些情况下,对当前成本(以美元计)进行更详细的计算可能很有用。计算使用以下公式计算:当前错误率 × 每次错误成本 × 数量(例如,5% 的错误率 × 每次错误 100 美元 × 1,000 次/月 = 节省 5,000 美元)。

In some cases, going into more detailed calculations of the current costs in dollars can be useful. Calculate it using the formula: current error rate × cost per error × volume (e.g., a 5% error rate × $100 per error × 1,000 instances/month = $5,000 savings).

4.可扩展性影响(这是否可以应用于整个组织?)

4. Scalability Impact (Can this be applied across the organization?)

可扩展性影响衡量的是解决方案在不以相同速度增加成本的情况下扩展的能力。得分越高(5分),意味着该任务可以自动化,并能以最小的额外成本在整个组织内扩展。

Scalability impact measures how well a solution can expand without increasing costs at the same rate. A higher score (5) means the task can be automated and scaled across the organization with minimal extra cost.

例如,一个适用于 10 个客户的自动化客户入职流程可以轻松扩展到 1,000 个客户,而无需增加员工,因此具有很高的可扩展性(评级为 5)。

For example, automating a client onboarding process that works for 10 clients can easily scale to 1,000 without needing additional staff, making it highly scalable (rated 5).

以下是四项评价标准及其分数含义的总结:

Here is a summary of the four criteria and the meaning of the scores:

图像

标准

Criteria

评分范围为1至5分

Score on a scale from 1 to 5

时间投入

Time Investment

1:每周不到一小时

1: Less than an hour per week

3:每周数小时

3: Several hours per week

5:每天数小时

5: Multiple hours daily

空闲时间的战略价值

Strategic Value of Freed Time

1:时间的替代用途有限

1: Limited alternative use of time

3:中等战略价值

3: Moderate strategic value

5:高价值战略活动受阻

5: High-value strategic activities blocked

降低误差的潜力

Error Reduction Potential

1:错误发生率很低

1: Few errors occur

3:偶尔出现重大错误

3: Occasional significant errors

5:频繁或代价高昂的错误

5: Frequent or costly errors

可扩展性影响

Scalability Impact

1:一次性任务

1: One-off task

3:中等可重复性

3: Moderately repeatable

5:在整个组织内具有高度可扩展性

5: Highly scalable across the organization

表 8.1:影响评估四项标准的评分(来源:© Bornet 等人)

Table 8.1: Scoring the four criteria of impact assessment (Source: © Bornet et al.)

然后,计算每项任务的总分。将四个评分标准下的分数相加。最低分为 4 分,最高分为 20 分。

Then, calculate the total score for each task. Add up the scores across the four criteria. The minimum possible score is 4, and the maximum is 20.

最后,比较各项任务的得分——总分越高,自动化的理由就越充分。

Finally, compare scores across tasks—The higher the total score, the stronger the case for automation.

这家营销机构的客户报告流程在所有类别中得分都很高:耗时过长(5分),阻碍了战略分析(5分),由于手动数据录入而经常出错(4分),并且无法扩展到所有客户(5分)。总分达到19分(满分20分),使其成为自动化的理想选择。

At the marketing agency, their client reporting process scored high across all categories: it consumed massive time (5), prevented strategic analysis (5), frequently contained errors due to manual data entry (4), and could scale across all clients (5). This gave it a total score of 19 out of 20, making it a prime candidate for automation.

步骤三:可行性评估

Step 3: Feasibility Assessment

对于每项影响较大的任务,评估其自动化的可行性。这有助于确定当前的人工智能代理技术(1-3级)是否能够可靠地处理该任务。我们将可行性分解为两个关键组成部分,每个部分均按1-5分进行评分。

For each task that scored highly on impact, evaluate its feasibility for automation. This helps determine whether current AI agent technology (Levels 1-3) can reliably handle the task. We break feasibility into two key components, each scored on a scale of 1-5.

1.流程标准化

1. Process Standardization

当任务步骤清晰且结构化时,自动化效果最佳。评估流程是否包含清晰的步骤、决策规则和例外情况。问问自己:“新手能否仅凭我们的文档就完全按照这个流程操作?”

Automation works best when tasks have clear and structured steps. Evaluate if the process has clear steps, decision rules, and exceptions. Ask: ‘Can someone new follow this process exactly using only our documentation?’

2.数据和系统访问

2. Data and System Access

人工智能需要结构化且易于访问的数据才能正常运行。如果数据分散、锁定在旧系统中或需要人工干预,自动化将难以实现。因此,需要测试数据是否结构化、易于检索,是否可通过现代系统访问,或者是否需要人工准备。

AI needs structured and accessible data to function. If data is scattered, locked in old systems, or requires human intervention, automation will be difficult. Test whether data is structured, easily retrievable, and available through modern systems or requires manual preparation.

请使用以下标准对流程标准化和数据准备情况进行评分:

Use the following scale to rate the process standardization and data readiness:

标准

Criteria

得分 1

Score 1

得分 3

Score 3

得分 5

Score 5

流程标准化

Process Standardization

1:该流程很大程度上是临时性的,没有标准方法。

1: Process is largely ad-hoc with no standard approach.

3:基本文档存在,但严重依赖员工经验。

3: Basic documentation exists but relies heavily on employee experience.

5:该过程有完整的文档记录,包括清晰的步骤、决策规则和例外情况。

5: The process is fully documented with clear steps, decision rules, and exceptions.

数据和系统访问

Data and System Access

1:重要数据被锁定在旧系统或纸质文件中。

1: Essential data is locked in legacy systems or paper-based.

3:数据可用,但需要大量准备工作。

3: Data is available but requires significant preparation.

5:所有数据都是结构化的,系统具有现代化的 API。

5: All data is structured, and systems have modern APIs.

表 8.2:可行性评估的两个标准评分(来源:© Bornet 等人)

Table 8.2: Scoring the two criteria of feasibility assessment (Source: © Bornet et al.)

通过对每个任务进行流程标准化、数据和系统访问方面的评分,您可以快速确定哪些高影响力任务已准备好进行自动化,哪些任务需要首先进行改进。

By scoring each task on Process Standardization and Data and System Access, you will quickly identify which high-impact tasks are ready for automation and which need improvements first.

第四步:实施工作

Step 4: Implementation Effort

一旦确定了高影响力任务并确认其可行性,下一步就是评估构建相应AI代理的难度。实施难度衡量了所涉及的技术复杂性,有助于您优先部署能够高效完成任务的AI代理。

Once high-impact tasks are identified and their feasibility is confirmed, the next step is evaluating how difficult it is to build an AI agent for them. Implementation effort measures the technical complexity involved, helping you prioritize AI agents that can be deployed efficiently.

技术复杂度评分方法(1-5分,反向计分)

How to Score Technical Complexity (1-5, Reverse Scored)

这是一个反向计分指标,意味着分数越低表示复杂度越高,实施难度越大。以下是评估方法:

This is a reverse-scored metric, meaning lower scores indicate higher complexity and greater difficulty in implementation. Here’s how to assess it:

评分 5(最容易实施)——该任务可以使用标准工具实现自动化,几乎不需要或完全不需要定制。

Score 5 (Easiest to Implement)—The task can be automated with standard tools that require little or no customization.

评分 4 分——需要进行一些定制,但它依赖于广泛使用的技术。

Score 4—Some customization is needed, but it relies on widely used technologies.

评分 3——该解决方案需要大量的定制开发,例如脚本编写。

Score 3—The solution requires significant custom development, such as scripting.

评分 2——需要复杂的集成或采用新技术。

Score 2—Complex integrations or new technology adoption is required.

评分 1(最难)——该任务需要尖端解决方案或广泛的研究和开发(R&D)。

Score 1 (Most Difficult)—The task demands cutting-edge solutions or extensive research and development (R&D).

我们建议始终先从成熟可靠的主流工具和平台入手,然后再去挑战技术边界。先自动化简单但影响巨大的任务,可以积累势头,降低风险,并确保在着手更复杂的解决方案之前取得早期成功。

Our recommendation is to always start with proven, mainstream tools and platforms before pushing technological boundaries. Automating simple, high-impact tasks first builds momentum, reduces risk, and ensures early wins before tackling more complex solutions.

综合分析:最终分析

Putting It All Together: The Final Analysis

经过数月帮助各组织评估人工智能代理的可行性,我们开发了一套系统化的方法来做出最终的实施决策。让我们回到珍妮的营销公司,看看这套方法在实践中是如何运作的。

After spending months helping organizations evaluate AI agent opportunities, we’ve developed a systematic approach to making the final implementation decision. Let’s return to Jenny’s marketing agency to see how this works in practice.

首先,我们将之前分析的得分合并:

First, we combine the scores from our previous analyses:

影响评分(根据我们的四个标准,最高 20 分)

Impact Score (maximum 20 points from our four criteria)

可行性评分(流程和数据评估最高 10 分)

Feasibility Score (maximum 10 points from process and data evaluation)

实施工作量得分(最高 5 分,反向计分)

Implementation Effort Score (maximum 5 points, reverse scored)

这样一来,总分最高可达35分。我们发现这种详细的评分方式尤其有价值,因为它迫使我们……组织需要系统地思考机遇的各个方面。

This gives us a total possible score of 35 points. We find this detailed scoring particularly valuable because it forces organizations to think through all aspects of the opportunity systematically.

珍妮的客户报告流程中,评分细则如下:

For Jenny’s client reporting process, the scores broke down like this:

影响:19/20(时间投入高、具有战略价值、减少错误、可扩展)

Impact: 19/20 (high time investment, strategic value, error reduction, and scalability)

可行性:8/10(流程文档齐全,可通过 API 获取数据)

Feasibility: 8/10 (well-documented process, accessible data through APIs)

实施难度:4/5(标准工具可用,只需少量定制)

Implementation Effort: 4/5 (standard tools available, minimal customization needed)

总分:31/35,表明它是人工智能代理实现的绝佳候选者。

Total Score: 31/35, indicating an excellent candidate for AI agent implementation.

最后润色:智能体人工智能优先级矩阵

The Final Touch: The Agentic AI Prioritization Matrix

虽然综合评分很有帮助,但我们发现,将机会可视化有助于团队和管理层做出更好的决策。我们将潜在项目绘制在一个双轴矩阵上:

While the comprehensive score is helpful, we’ve found that visualizing opportunities helps teams and management make better decisions. We plot potential projects on a matrix with two axes:

纵轴:转型复杂性(可行性评分和工作量评分相结合)

Vertical Axis: Complexity of the Transformation (combining Feasibility and Effort scores)

横轴:业务影响(使用我们的影响评分)

Horizontal Axis: Business Impact (using our Impact score)

这就形成了四个象限:

This creates four quadrants:

1.快速见效(高影响、低复杂度):您理想的代理机会

1. Quick Wins (High Impact, Low Complexity): Your ideal agentic opportunities

2.战略项目(高影响力、高复杂性):需要精心规划的未来机遇

2. Strategic Projects (High Impact, High Complexity): Future opportunities requiring careful planning

3.低优先级(低影响、低复杂度):锦上添花的自动化功能

3. Low Priority (Low Impact, Low Complexity): Nice-to-have automations

4.避免(低影响,高复杂度):不值得付出努力

4. Avoid (Low Impact, High Complexity): Not worth the effort

对于这家营销机构而言,客户报告项目恰好属于“快速见效”象限:影响显著,复杂度相对较低。这成为他们的第一个人工智能代理项目,并最终促成了我们之前描述的成功转型。

For the marketing agency, client reporting landed squarely in the Quick Wins quadrant: high impact with relatively low complexity. This became their first AI agent project, leading to the successful transformation we described earlier.

图像

图 8.2:智能体 AI 优先级矩阵(来源:© Bornet 等人)

Figure 8.2: The Agentic AI Prioritization Matrix (Source: © Bornet et al.)

关于遴选的最后说明

A Final Note on Selection

如果这是您首次实施智能体,请注意,它将为未来的自动化项目定下基调。因此,除了上述标准之外,选择一个合适的契机也至关重要:

If this is your first agentic implementation, note that it will set the tone for future automation initiatives. So, in addition to the above criteria, it is important to choose an opportunity that:

将在 3-6 个月内显示出清晰、可衡量的结果

Will show clear, measurable results within 3-6 months

影响人数众多,足以获得广泛支持。

Affects enough people to build broad support

是否有人愿意负责监督实施?

Has a champion willing to oversee the implementation

可以作为贵组织的一次学习经历。

Can serve as a learning experience for your organization

与首位代理商的成功合作能够增强信心,并为更具挑战性的项目奠定基础。这家营销机构成功的报告自动化系统促成了接下来一年内另外五家代理商的实施,每家都吸取了首家的经验教训。

Success with your first agent builds confidence and creates momentum for more ambitious projects. The marketing agency’s successful reporting automation led to five more agent implementations over the next year, each building on lessons learned from the first.

步骤二:定义人工智能代理的角色和能力

Step 2: Defining AI Agents’ Role and Capabilities

在确定了人工智能代理的合适应用场景之后,下一步的关键是明确你需要哪种类型的代理。这就像撰写职位描述一样——你需要非常清楚地说明你的数字员工的角色、职责和所需能力。

After identifying the right opportunities for AI agents, the next crucial step is defining exactly what kind of agent you need. Think of this like writing a job description—you need to be crystal clear about the role, responsibilities, and required capabilities of your digital worker.

让我们回到这家数字营销机构,珍妮和她的团队决定首先实现每月客户报告流程的自动化。“我知道我想实现这个流程的自动化,”珍妮告诉我们,“但我不知道需要什么样的代理。是只需要提取数据的简单代理,还是能够生成报告分析的更高级的代理?”

Let’s return to the digital marketing agency, where Jenny and her team have decided to start with automating their monthly client reporting process. “I know I want to automate this process,” Jenny told us, “but I’m not sure what kind of agent I need. Should it be something simple that just pulls data, or something more sophisticated that can actually write report analyses?”

这是我们经常从企业领导者那里听到的问题,答案在于了解不同级别的 AI 代理能力,并将其与您的具体需求相匹配。

This is a common question we hear from business leaders, and the answer lies in understanding the different levels of AI agent capabilities and matching them to your specific needs.

了解智能体级别:从简单到复杂

Understanding Agent Levels: From Simple to Sophisticated

可以将人工智能代理视为技能水平不同的员工。利用我们的代理人工智能发展框架,我们可以大致了解……虽然目前生产环境中通常只部署 1-3 级,但我们仍将它们分为五个级别。更准确地说,2 级(智能自动化)和 3 级(代理工作流)最为常见,因此我们将重点关注这两个级别。

Think of AI agents as employees with different skill levels. Using our Agentic AI Progression Framework, we can broadly categorize them into five levels, though currently, only levels 1-3 are commonly deployed in production environments. To be precise, Level 2 (Intelligent Automation) and Level 3 (Agentic Workflow) are the most common, so we will focus on these two levels.

二级客服人员更像是经验丰富的专业人士,能够遵循简单、不变的指令。三级客服人员则更像是资深专业人士,能够理解上下文并处理更复杂的任务。

Level 2 agents are more like experienced professionals who can follow simple, unchanging instructions. Level 3 agents are like senior professionals who can understand the context and manage more sophisticated tasks.

对于这家营销机构的报告流程,珍妮需要确定哪个层级最合适。让我们一起来看看他们是如何做出决定的。

For the marketing agency’s reporting process, Jenny needed to determine which level would be most appropriate. Let’s walk through the decision-making process they used.

决策框架

The Decision Framework

为了选择合适的代理级别,我们评估三个关键标准,以帮助确定 2 级代理还是 3 级代理更合适:

To choose the right agent level, we evaluate three key criteria that help determine whether a Level 2 or Level 3 agent is more appropriate:

1. 任务可预测性(确定性与概率性)

1. Task Predictability (Deterministic vs. Probabilistic)

首先,考虑一下你的流程有多可预测。二级智能体擅长处理确定性任务——遵循清晰、不变规则的流程。它们就像可靠的工人,每次都能完美地执行相同的步骤。而三级智能体则由基础模型驱动,能够处理概率性任务,这类任务需要理解上下文并做出判断。

First, consider how predictable your process is. Level 2 agents excel at deterministic tasks—processes that follow clear, unchanging rules. They’re like reliable workers who execute the same steps perfectly every time. Level 3 agents, powered by foundation models, can handle probabilistic tasks that require understanding context and making judgment calls.

在营销机构,珍妮的团队分析了他们的报告流程,发现其中约 80% 是确定性的——从各种平台提取特定指标并以标准格式整理。这部分工作非常适合二级经纪人。然而,剩下的 20% 涉及解读趋势和撰写分析报告,这需要概率思维,更适合三级经纪人。

At the marketing agency, Jenny’s team analyzed their reporting process and found that about 80% of it was deterministic—pulling specific metrics from various platforms and organizing them in a standard format. This portion was perfect for a Level 2 agent. However, 20% involved interpreting trends and writing insights, which required probabilistic thinking that is better suited for a Level 3 agent.

2. 误差敏感度

2. Error Sensitivity

接下来,评估错误对您的运营会造成多大的影响。二级代理非常可靠,尤其适用于对准确性要求极高的任务,因为它们严格遵循规则,不会偏离预设模式。它们非常适合财务计算、数据处理以及其他任何错误都可能造成巨大损失的任务。

Next, assess how critical errors would be to your operation. Level 2 agents are highly reliable for tasks where accuracy is paramount because they follow exact rules and won’t deviate from prescribed patterns. They’re ideal for financial calculations, data processing, and other tasks where errors could be costly.

三级智能体虽然更加灵活,但偶尔也会产生意想不到的输出或做出创造性的解读。它们更适合那些允许甚至有益于一定程度变化的任务,例如内容生成或模式分析。

Level 3 agents, while more flexible, may occasionally produce unexpected outputs or make creative interpretations. They’re better suited for tasks where some variation is acceptable or even beneficial, like content generation or pattern analysis.

对于这家营销机构的客户报告而言,数据收集和计算的准确性至关重要——任何一个错误都可能损害客户的信任。因此,二级代理非常适合负责数据处理部分。然而,书面分析中允许存在一些细微的偏差,有时甚至需要这种偏差,因此这部分工作更适合三级代理。

For the marketing agency’s client reporting, accuracy in the data gathering and calculations was crucial—a single error could damage client trust. This made Level 2 agents perfect for the data processing portion. However, slight variations in the written analysis were acceptable and sometimes even desirable, making this part suitable for a Level 3 agent.

3.投入变异性

3. Input Variability

最后,评估一下你的输入数据变化有多大。二级智能体在处理标准化输入数据时表现最佳——它们需要一致的数据格式和可预测的场景。它们难以应对异常情况,也无法在不重新编程的情况下适应意外变化。

Finally, evaluate how much your inputs vary. Level 2 agents work best with standardized inputs—they need consistent data formats and predictable scenarios. They struggle with exceptions and can’t adapt to unexpected variations without being reprogrammed.

然而,三级智能体擅长处理各种不同的输入。它们能够理解上下文,适应不同的数据格式,并能理解非结构化数据。它们就像经验丰富的专业人士,可以根据具体情况调整处理方法。

Level 3 agents, however, excel at handling variable inputs. They can understand context, adapt to different formats, and make sense of unstructured data. They’re like experienced professionals who can adjust their approach based on the situation.

这家营销机构处理的数据来源多种多样,但格式统一——这对于二级经纪人来说非常合适。然而,不同客户和行业的数据解读背景差异很大,因此需要三级经纪人进行分析。

The marketing agency dealt with various data sources, but they followed consistent formats—perfect for a Level 2 agent. However, the context for interpreting this data varied significantly across different clients and industries, making a Level 3 agent necessary for the analytical portion.

将特征映射到代理级别

Mapping Characteristics to Agent Levels

总结一下这些标准如何映射到代理级别:

To summarize how these criteria map to agent levels:

标准

Criteria

二级代理(智能自动化)

Level 2 Agents (Intelligent Automation)

三级代理(代理工作流程)

Level 3 Agents (Agentic Workflow)

任务可预测性

Task Predictability

具有明确规则的高度确定性过程

Highly deterministic processes with clear rules

能够处理需要理解上下文并做出适应性反应的任务

Handles tasks requiring context understanding and adaptive responses

误差敏感度

Error Sensitivity

需要100%的准确度和可靠性

Requires 100% accuracy and reliability

允许输出结果存在一定波动,从而允许进行概率性决策。

Accepts some variability in outputs, allowing for probabilistic decisions

输入变异性

Input Variability

使用标准化的输入和最小的变化效果最佳

Works best with standardized inputs and minimal variations

能够处理可变或非结构化输入

Can process variable or unstructured inputs

最适合

Best For

工作量大、重复性高且遵循固定工作流程的任务

High-volume, repetitive tasks that follow fixed workflows

涉及推理、决策或自然语言理解的任务

Tasks that involve reasoning, decision-making, or natural language understanding

示例用例

Example Use Cases

数据提取、财务计算、合规性检查、结构化报告

Data extraction, financial calculations, compliance checks, structured reporting

报告撰写、客户支持聊天机器人、欺诈检测、趋势分析

Report writing, customer support chatbots, fraud detection, trend analysis

表 8.3:二级和三级人工智能代理的比较(来源:© Bornet 等人)

Table 8.3: Comparison between Level 2 and Level 3 AI agents (Source: © Bornet et al.)

做出选择:营销机构的决策

Making the Choice: The Marketing Agency’s Decision

在评估了这些标准之后,我们帮助这家营销机构做出了一个有趣的决定:他们将采用混合方法,在流程的不同部分使用 2 级和 3 级代理商。

After evaluating these criteria, we helped the marketing agency make an interesting decision: they would implement a hybrid approach using both Level 2 and Level 3 agents for different parts of the process.

二级代理(我们称之为智能自动化)将处理工作的确定性部分:

The Level 2 agent (what we call Intelligent Automation) would handle the deterministic portions of the work:

访问各种平台(谷歌分析、社交媒体、广告平台)

Accessing various platforms (Google Analytics, social media, advertising platforms)

提取标准化数据集

Extracting standardized data sets

执行计算并创建可视化

Performing calculations and creating visualizations

生成基本报告结构

Generating the basic report structure

3 级代理(代理工作流)随后会:

The Level 3 agent (an Agentic Workflow) would then:

分析处理后数据的趋势

Analyze trends in the processed data

提出初步见解和建议

Generate initial insights and recommendations

撰写报告的叙述性章节草稿

Create draft narrative sections of the report

这种混合方法充分利用了每种智能体的优势。二级智能体能够可靠地处理精确、确定性的任务,准确率接近完美;而三级智能体则更适合处理数据解释和叙事生成这类更为细致的工作。

This hybrid approach capitalized on the strengths of each agent type. The Level 2 agent could reliably handle precise, deterministic tasks with near-perfect accuracy, while the Level 3 agent was suited for the more nuanced work of data interpretation and narrative generation.

在接下来的章节中,我们将概述如何设计和构建三级智能体。关于构建二级智能体,我们建议参考《智能自动化》一书。136

In the upcoming section, we will outline how to design and build Level 3 agents. For building Level 2 agents, we recommend referring to the Intelligent Automation book.136

学习曲线

The Learning Curve

值得注意的是,人工智能代理的实施通常是一个迭代过程。这家营销机构最初采用的是二级代理,仅负责数据收集和基本报告。待其运行顺畅且团队熟悉该技术后,他们才引入了三级代理用于内容生成。

It’s important to note that implementing AI agents is typically an iterative process. At the marketing agency, they started with the Level 2 agent handling just the data collection and basic reporting. Once this was working smoothly and the team was comfortable with the technology, they introduced the Level 3 agent for content generation.

珍妮回忆道:“起初,我想一次性实现所有功能的自动化。但后来我决定先从基础数据处理入手,逐步增加其他功能。”更先进的功能帮助我们的团队进行了调整,并切实地改变了我们对报告流程的思考方式。”

Jenny reflected, “At first, I wanted to automate everything at once. But starting with the basic data work and gradually adding more sophisticated capabilities helped our team adapt and actually shaped how we thought about our reporting process.”

结果与经验教训

Results and Lessons Learned

实施六个月后,该营销机构的人工智能代理系统能够以惊人的效率处理所有客户的报告:

Six months after implementation, the marketing agency’s AI agent system was processing reports for all clients with remarkable efficiency:

报告生成时间缩短至 45 分钟

Report generation time reduced to 45 minutes

数据准确率保持在99.95%

Data accuracy maintained at 99.95%

客户满意度评分提高了15%。

Client satisfaction scores increased by 15%

市场营销团队报告称,用于战略工作的时间增加了 60%。

Marketing team reported 60% more time for strategic work

珍妮回顾了他们的历程:“关键在于不要急于实现自动化。通过有条不紊的设计方法和逐步建立信任,我们创建了一个团队和客户都完全信任的系统。”

Jenny reflected on their journey: “The key was not rushing to automation. By taking a methodical approach to design and building trust gradually, we created a system that both our team and our clients trust completely.”

这家营销机构的报告系统取得了成功,为在整个组织内部署人工智能代理开辟了新的可能性。他们目前正在探索将其应用于营销活动优化、内容创作和客户旅程分析等领域。

The success of the marketing agency’s reporting system has opened new possibilities for AI agent implementation across their organization. They’re now exploring applications in campaign optimization, content creation, and customer journey analysis.

此次实施过程中汲取的经验教训凸显了周全设计的重要性,这种设计既要考虑技术能力,也要考虑人为因素。正如我们将在下一章探讨的那样,这些设计决策为成功实施和部署奠定了至关重要的基础。

The lessons learned from this implementation highlight the importance of thoughtful design that considers both technical capabilities and human factors. As we’ll explore in the next chapter, these design decisions lay the crucial groundwork for successful implementation and deployment.

展望未来

Moving Forward

在明确了二级和三级代理商的角色之后,这家营销机构准备进入实施的详细设计阶段。清晰的职责和能力划分对于接下来的工作至关重要。绘制流程图和建立绩效标准的步骤。

With clear roles defined for both their Level 2 and Level 3 agents, the marketing agency was ready to move into the detailed design phase of their implementation. The clear delineation of responsibilities and capabilities would prove crucial for the next steps of mapping process flows and establishing performance criteria.

请记住,我们的目标并非打造最复杂的智能体,而是设计一个能够可靠地创造价值,并能与您团队的工作流程无缝衔接的智能体。正如我们将在下一章探讨的那样,明确定义的角色和能力是成功实施的关键。

Remember, the goal isn’t to create the most sophisticated agent possible but rather to design one that reliably delivers value while integrating smoothly with your human team’s workflows. As we’ll explore in the next chapter, this foundation of clearly defined roles and capabilities is essential for successful implementation.

步骤三:设计成功的AI代理

Step 3: Designing AI Agents for Success

在找到合适的机遇并确定合适的代理类型之后,下一个关键阶段是设计人工智能代理以确保成功实施。这是理论与实践相结合的关键时刻,也是许多组织能否成功或陷入常见陷阱的转折点。让我们通过自身创建用于新闻通讯自动化的人工智能代理的经验来探讨这个问题。

After identifying the right opportunity and determining the appropriate type of agent, the next crucial phase is designing your AI agents for successful implementation. This is where theory meets practice, and where many organizations either set themselves up for success or stumble into common pitfalls. Let’s explore this through our own experience creating an AI agent for newsletter automation.

电子报挑战:一个真实案例研究

The Newsletter Challenge: A Real-World Case Study

作为人工智能领域的作者和意见领袖,我们发现自己每周都要花费大量时间来整理内容、撰写摘要,并为我们的社群制作新闻简报。这是一个劳动密集型的过程,虽然对我们的读者来说很有价值,但却挤占了我们进行更深入研究和写作的时间。正是在这种情况下,我们决定运用我们在人工智能代理方面的专业知识来解决这一难题。

As authors and influencers in the AI space, we found ourselves spending countless hours each week curating content, writing summaries, and creating newsletters for our community. It was a labor-intensive process that, while valuable for our readers, took time away from deeper research and writing. That’s when we decided to apply our expertise in AI agents to solve this challenge.

“我们当时每周几乎要花一天时间制作电子报,”帕斯卡尔回忆道。“我们知道肯定有更好的方法。” 结果如何呢?一套全新的人工智能代理系统,不仅自动化了大部分流程,还帮助我们在短短一个月内将电子报的订阅用户增长到超过30万。

“We were spending nearly one day each week just on newsletter creation,” Pascal recalls. “We knew there had to be a better way.” The result? A new AI agent system that not only automated much of the process but also helped us grow our new newsletter to over 300,000 subscribers in just one month.

第 9 章中,我们将讨论我们如何开发这种创新的智能体商业机会并将其转化为一项事业,重点介绍智能体人工智能在创造新的商业可能性方面的巨大潜力。

In Chapter 9, we discuss how we developed this innovative agentic business opportunity and transformed it into a venture, highlighting the immense potential of agentic AI to generate new business possibilities.

设计原则一:以终为始

Design Principle #1: Start with the End in Mind

在深入探讨技术规范之前,成功的代理设计始于明确的成功标准。“我们看到的最大错误之一,”Rakesh根据他的咨询经验分享道,“就是企业在没有定义成功标准的情况下,就直接跳到解决方案设计阶段。”

Before diving into technical specifications, successful agent design begins with clear success criteria. “One of the biggest mistakes we see,” Rakesh shares from his consulting experience, “is organizations jumping straight to solution design without defining what success looks like.”

明确成功标准至关重要,原因有三。首先,它为开发团队设定了明确的目标,并有助于确定功能优先级。其次,它为衡量投资回报率和系统性能建立了基准。第三,它有助于管理利益相关者的期望,并增强他们对新系统的信心。

Defining clear success criteria is essential for three key reasons. First, it provides clear targets for the development team and helps prioritize features. Second, it establishes benchmarks for measuring the return on investment and system performance. Third, it helps manage stakeholders’ expectations and build confidence in the new system.

如果没有明确的成功标准,我们可能会构建一个技术上令人印象深刻,但实际上并没有解决我们核心业务需求的系统。

Without clear success criteria, we might have built a system that was technically impressive but didn’t actually solve our core business needs.

成功标准应包括定量和定性指标:

Success criteria should include both quantitative and qualitative metrics:

量化指标可能包括效率提升、错误率降低或具体绩效目标。

Quantitative metrics might include efficiency gains, error reduction rates, or specific performance targets

定性指标通常侧重于用户体验、输出质量和系统可靠性。

Qualitative metrics often focus on user experience, quality of output, and system reliability

例如,在我们的电子报案例中,我们建立了以下指标:

For example, in our newsletter case, we established metrics like:

将创作时间从每周 15 小时减少到 3 小时

Reducing creation time from 15 to 3 hours per week

内容准确率保持在95%以上

Maintaining 95%+ accuracy in content

邮件打开率达到 40% 以上

Achieving 40%+ email open rates

但更重要的是,我们还围绕内容相关性和读者体验定义了定性成功标准。

But more importantly, we also defined qualitative success criteria around content relevance and reader experience.

设计原则二:了解你目前的状况

Design Principle #2: Understand Your Current State

我们观察到,“许多组织都想直接迈向未来,但了解您当前的流程对于成功设计代理至关重要。”

“Many organizations want to leap straight to the future,” we observe, “but understanding your current process is crucial for successful agent design.”

在设计人工智能代理之前,详细了解您当前的工作流程至关重要,原因有以下几点。首先,它可以揭示一些乍看之下可能并不明显的隐藏复杂性和依赖关系。其次,它有助于识别新系统需要解决的潜在瓶颈和效率低下之处。最后,它可以确保在自动化过程中不会忽略关键的业务规则和例外情况。

Understanding your current workflow in detail before designing AI agents is crucial for several reasons. First, it reveals hidden complexities and dependencies that might not be apparent at first glance. Second, it helps identify potential bottlenecks and inefficiencies that the new system should address. Finally, it ensures that critical business rules and exceptions aren’t overlooked in the automation process.

当我们开始绘制工作流程图时,我们发现原本以为简单的四步流程实际上包含了数十个需要自动处理的微小决策和例外情况。如果没有记录这些,我们构建的智能系统就会忽略关键的细节。

When we started mapping our workflow, we discovered that what we thought was a simple four-step process actually involved dozens of micro-decisions and exceptions that we handled automatically. Without documenting these, we would have built an agentic system that missed crucial nuances.

以下是我们目前在构建电子报刊流程中遇到的问题:

Here are the issues we faced with the existing process of building the newsletter:

耗时:手动搜索、汇总、整理和格式化需要大量精力,每天每周都要花费数小时。我们估计,每份简报每周的总工作量约为 10 小时。

Time-Consuming: Manual searching, summarizing, compiling, and formatting require significant effort and multiple hours each day and week. We estimated the total workload to be about 10 hours per week per newsletter.

容易出错:我们在重复性工作中容易出错,例如总结和格式化。这种情况发生过几次,对我们的声誉造成了严重损害。

Prone to Errors: We were likely to make mistakes in repetitive tasks, such as summarizing and formatting. This happened a few times and was very damaging to our reputation.

可扩展性问题:随着文章数量的增长,手动处理变得不可持续,需要花费数小时的工作时间。

Scalability Issues: As the volume of articles grows, the manual process becomes unsustainable, requiring hours of work.

质量不稳定:产出会因执行每项任务的个人而异,导致团队变动时结果不均衡。

Inconsistent Quality: Output used to vary depending on the individual performing each task, leading to uneven results when the team changes.

设计原则三:设计目标流程

Design Principle #3: Design the Target Process

在记录了我们现有的工作流程之后,我们没有简单地将旧流程自动化,而是以最终结果为导向,对其进行了彻底的重新设计。根据我们的经验,沿用过时的工作流程往往会导致效率低下,因为仅仅依靠自动化无法解决根本的设计缺陷。

After documenting our existing workflow, instead of simply automating the old process, we redesigned it entirely with the outcome in mind. Based on our experience, holding onto outdated workflows often leads to inefficiencies, as automation alone won’t fix fundamental design flaws.

正如沙伊尔常说的,“输入垃圾,输出垃圾。如果你将一个有缺陷的流程自动化,最终只会得到一个有缺陷的自动化流程。”

As Shail usually says, “Garbage in, garbage out. If you automate a flawed process, it will just result in flawed automation.”

相反,我们专注于想要实现的目标——一个精简、可扩展且零错误的电子报制作流程——并围绕这个目标构建了工作流程。新流程遵循结构化的每日和每周工作流程。

Instead, we focused on what we wanted to achieve—a streamlined, scalable, and error-free newsletter production process—and built the workflow around that goal. The new process follows a structured daily and weekly workflow.

每天应筛选出相关文章,总结关键观点,并以格式化的电子邮件形式发送供人工审核。审核通过的摘要应汇编成结构化文档,以确保质量控制。

Each day, relevant articles should be identified, key insights summarized, and a formatted email sent for human review. The selected summaries should then be compiled into a structured document, ensuring quality control.

周一,内容应排版成视觉效果精美的电子报,审核准确性,并最终定稿分发。

On Mondays, the content should be formatted into a visually polished newsletter, reviewed for accuracy, and finalized for distribution.

通过将流程分解为清晰明确的步骤,我们消除了不必要的复杂性,提高了质量保证水平,并确保了流畅的发布周期。关键在于:不要仅仅追求自动化——要重新思考、重新设计并优化,以实现最佳结果。

By breaking the process into clear, well-defined steps, we eliminated unnecessary complexity, improved quality assurance, and ensured a smooth publishing cycle. The key lesson: don’t just automate—rethink, redesign, and optimize for the best possible outcome.

设计原则四:选择合适的架构

Design Principle #4: Choose the Right Architecture

智能体设计中最关键的决策之一就是架构设计方法。“你可以把它想象成盖房子,”我们解释道,“你可以尝试创建一个包罗万象的大房间,也可以设计一些功能各异、和谐共存的空间。根据我们的经验,后一种方法几乎总是更有效。”

One of the most critical decisions in agent design is the architectural approach. “Think of it like building a house,” we explain. “You could try to create one massive room that serves every purpose, or you could design specialized spaces that work together harmoniously. In our experience, the latter approach almost always works better.”

这就引出了代理设计中的一个基本问题:是应该构建一个复杂的代理来处理所有事情,还是应该创建一个由专业代理组成的团队?

This brings us to a fundamental question in agent design: Should you build one complex agent to handle everything or create a team of specialized agents?

在设计我们的邮件营销自动化系统时,我们面临着一个关键抉择:是构建一个功能强大的单一代理来处理所有事务,还是创建一支由多个专业代理组成的团队?我们为多家公司实施人工智能系统的经验告诉我们,更简单、更专业的组件往往比复杂、一体化的系统效果更好。在本书的这一部分,您已经多次听到我们“一个工具,一个代理”的设计原则。接下来,我们将探讨促成我们成功设计的关键决策。

When designing our newsletter automation system, we faced a crucial decision: should we build one complex agent to handle everything or create a team of specialized agents? Our experience implementing AI systems for various companies has taught us that simpler, specialized components often work better than complex, monolithic ones. At this stage of the book, you have already heard several times our design principle of “one tool, one agent.” Let’s explore the key design decisions that led to our successful design.

“最初,我们曾想打造一个‘超级代理’,它可以处理从内容发现到新闻简报格式设置的所有工作,”我们的合著者之一拉凯什解释说。“但我们很快意识到,这就像要求一个人同时担任研究员、作家、编辑和设计师——这种情况很少能成功。”

“Initially, we were tempted to build one ‘super-agent’ that could handle everything from content discovery to newsletter formatting,” Rakesh, one of our co-authors, explains. “But we quickly realized this would be like asking one person to be simultaneously a researcher, writer, editor, and designer—it rarely works well.”

因此,我们组建了一支由专业人员组成的综合团队:

Hence, we designed a comprehensive team of specialized agents:

代理人姓名

Agent Name

代理人角色

Agent Role

搜索代理

Search Agent

在网络上查找相关文章

Finds relevant articles on the web

摘要代理

Summarization Agent

总结文章要点

Summarizes key points from the articles

电子邮件代理

Email Agent

每天通过电子邮件发送摘要

Sends summaries via email every day

编译器代理

Compiler Agent

根据选定的文章整理内容

Organizes content based on selected articles

新闻稿格式代理

Newsletter Formatting Agent

准备最终新闻稿

Prepares final newsletter

经理代理

Manager Agent

坐标 + 最终交付

Coordinates + final delivery

表 8.4:我们用于新闻简报自动化的专业 AI 代理团队(来源:© Bornet 等人)

Table 8.4: Our team of specialized AI agents for the Newsletter automation (Source: © Bornet et al.)

这种模块化方法使每个代理都能在其专业角色中发挥出色,同时通过协调互动保持系统的灵活性。

This modular approach allowed each agent to excel in its specialized role while maintaining system flexibility through coordinated interaction.

指导我们设计的关键原则:

Key Principles That Guided Our Design:

我们在设计多智能体系统时遵循了六项核心原则:

We followed six core principles in designing our multi-agent system:

1. 目标明确:“每个代理人都需要像一位技艺精湛的专家,”拉凯什解释说。“就像你不会要求心脏外科医生同时也是麻醉师一样,我们为每位代理人设定了清晰明确的职责。”

1. Clarity of Purpose: “Each agent needs to be like a skilled specialist,” Rakesh explains. “Just as you wouldn’t ask a heart surgeon to also be an anesthesiologist, we gave each agent a clear, focused role.”

2. 通过专业化提高效率:我们没有构建能够胜任多项工作的复杂代理,而是创建了能够出色地完成一项工作的简单代理。

2. Efficiency Through Specialization: Rather than building complex agents that could do many things adequately, we created simple agents that could do one thing exceptionally well.

3. 可扩展性设计: “我们设计的代理系统能够应对不断增长的工作负载,”Rakesh 分享道。例如,如果内容量增加,我们可以引入多个并行工作的搜索代理,或者优化摘要代理以加快处理速度。

3. Scalability By Design: “We designed the system of agents to handle increasing workloads,” Rakesh shares. For example, if content volume increases, we can introduce multiple Search Agents working in parallel or optimize the Summarization Agent for faster processing.

4. 自主运行:每个代理在其领域内独立运行,根据明确的标准做出决策,而无需持续监督。

4. Autonomous Operation: Each agent operates independently within its domain, making decisions based on clear criteria without needing constant oversight.

5. 无缝协作:代理商们像一支协调良好的团队一样一起工作,采用标准化的沟通协议和清晰的交接点。

5. Seamless Collaboration: The agents work together like a well-coordinated team, with standardized communication protocols and clear handoff points.

6. 集中式协调:我们的管理代理确保所有组件协调运作,就像指挥家指挥交响乐团一样。

6. Centralized Orchestration: Our Manager Agent ensures all components work in harmony, much like a conductor leading an orchestra.

设计原则五:构建人机协作机制

Design Principle #5: Build in Human-AI Collaboration

智能体设计中最关键的方面或许在于确定人类和人工智能如何协同工作。“我们的目标不是将人类排除在外,”南丹强调说,“而是创造一种共生关系,让人类和人工智能都能发挥各自独特的优势。”

Perhaps the most crucial aspect of agent design is determining how humans and AI will work together. “The goal isn’t to remove humans from the process,” Nandan emphasizes, “but to create a symbiotic relationship where both humans and AI contribute their unique strengths.”

当我们考虑到人工智能代理目前的局限性时,这一原则就显得尤为重要。在我们提出的发展框架的第 1-3 级,代理仍然缺乏真正的自适应学习和复杂的推理能力。这意味着人类的专业知识在以下方面仍然至关重要:

This principle becomes particularly important when we consider the current limitations of AI agents. At Levels 1-3 of our Progression Framework, agents still lack true adaptive learning and sophisticated reasoning capabilities. This means human expertise remains essential for:

1. 战略决策:人类擅长理解背景并做出细致入微的判断。

1. Strategic Decision-Making: Humans excel at understanding context and making nuanced judgments

2. 质量保证:提供监督并发现细微错误

2. Quality Assurance: Providing oversight and catching subtle errors

3. 异常处理:管理超出正常参数范围的意外情况

3. Exception Handling: Managing unexpected situations that fall outside normal parameters

4. 持续改进:识别系统增强的机会

4. Continuous Improvement: Identifying opportunities for system enhancement

在最初设计邮件系统时,我们面临一个根本性问题:哪些任务应该自动化,哪些任务应该保留人工?我们的思路是,不是思考哪些任务“可以”自动化,而是思考哪些任务“应该”自动化。

When we first designed our newsletter system, we faced a fundamental question: Which tasks should be automated, and which should remain in human hands? Our approach was to think not in terms of what “could” be automated but what “should” be automated.

我们发现了两个需要人工参与的关键点:

We identified two critical points where human input was essential:

第一个关键的人为接触点出现在人工智能代理收集并总结内容之后。一位人工编辑会审核每日摘要,并从中挑选出最相关的文章发布到新闻简报中。这一决定需要了解受众的需求,识别新兴趋势,并对内容价值做出细致入微的判断——这些能力仍然是人类独有的。

The first key human touchpoint comes after our AI agents have gathered and summarized content. A human editor reviews the daily summaries and selects the most relevant pieces for the newsletter. This decision requires understanding our audience’s needs, recognizing emerging trends, and making nuanced judgments about content value—capabilities that remain uniquely human.

在实施过程中,我们发现了一个有趣的现象:当让人工智能做出最终的内容选择时,新闻简报在技术上变得精准无误,但却缺乏灵魂。它们缺少了人工策划所带来的战略叙事和主题连贯性。通过让人工参与这一关键决策,我们得以保持新闻简报独特的风格和战略相关性。

During implementation, we discovered an interesting pattern: when we allowed the AI to make final content selections, the newsletters became technically accurate but somehow soulless. They lacked the strategic narrative and thematic coherence that comes from human curation. By keeping humans in this crucial decision-making role, we maintained the newsletter’s distinctive voice and strategic relevance.

第二个关键的人工环节发生在发布前。此时,编辑会审阅整份电子报,在需要的地方添加背景信息,并确保内容符合我们的编辑标准。这最后的审核不仅仅是为了发现错误,更是为了确保电子报能为读者提供真正的价值。

The second critical human touchpoint occurs just before publication. Here, human editors review the complete newsletter, add context where needed, and ensure the content aligns with our editorial standards. This final review isn’t just about catching errors—it’s about ensuring the newsletter provides genuine value to our readers.

“这种人机协作的方式,”南丹解释说,“让我们能够兼顾两者的优势。智能体处理耗时的任务,而人类则可以专注于需要判断力和行业专业知识的战略决策。”

“This human-in-the-loop approach,” Nandan explains, “gives us the best of both worlds. The agents handle the time-consuming tasks, while humans focus on strategic decisions that require judgment and industry expertise.”

结果:目标智能体系统

The Outcome: The Target Agentic System

经过几天的努力,我们终于完成了。以下是我们设计的新目标流程,展示了关键活动的流程、负责人以及流程中的人员:

After a few days of work, here we are. Below is the new target process we have designed, presenting the flow of the key activities, the agents in charge, and the human in the loop:

日常工作流程

Daily Workflow

1. 搜索代理:在网络上搜索相关文章,并将链接发送给摘要代理。

1. Search Agent: Searches the web for relevant articles and sends links to the Summarization Agent.

2. 摘要代理:将每篇文章总结为三个要点,并将摘要发送给电子邮件发送代理。

2. Summarization Agent: Summarizes each article into three key points and sends summaries to the Email Delivery Agent.

3. 电子邮件发送代理:将摘要编译成格式化的电子邮件,并将其发送给人工审核员。

3. Email Delivery Agent: Compiles summaries into a formatted email and sends it to the human reviewer.

4. 人工审核员:审核每日邮件,并回复推荐文章。

4. Human Reviewer: Reviews daily emails and replies with preferred articles.

5. 编译代理:将每日选定的摘要编译成结构化的 Google 文档并执行质量保证。

5. Compiler Agent: Compiles daily selected summaries into a structured Google Doc and performs QA.

每周工作流程

Weekly Workflow

1. 简报格式代理:每周一,代理会将 Google 文档格式化成美观的简报。

1. Newsletter Formatting Agent: On Mondays, the agent formats the Google Doc into a visually appealing newsletter.

2. 经理代理:审核格式化后的新闻稿,并将其发送给人工审核员。

2. Manager Agent: Reviews the formatted newsletter and sends it to the human reviewer.

3. 人工审核员:进行最终审核并将新闻稿发布到指定平台。

3. Human Reviewer: Performs final review and posts the newsletter to desired platforms.

第四步:实施您的人工智能代理

Step 4: Implementing Your AI Agents

选择合适的AI代理平台:速度优先于完美

Choosing the Right AI Agent Platform: Prioritize Speed Over Perfection

人工智能代理平台市场发展迅速,目前约有400家供应商提供跨领域的解决方案。人工智能代理的复杂程度和易用性都很高。这种选择过多虽然为企业提供了众多选择,但也可能导致决策瘫痪。通过我们在各个行业部署人工智能代理的工作,我们发现,企业往往花费数月时间评估平台,而他们的竞争对手却已经抢先部署并获得了宝贵的市场优势。

The AI agent platform market has evolved rapidly, with approximately 400 vendors now offering solutions across different levels of sophistication and ease of use. This abundance of choice, while providing numerous options for organizations, can also lead to decision paralysis. Through our work implementing AI agents across various industries, we’ve observed that companies often spend months evaluating platforms while their competitors forge ahead with implementations and gain valuable market advantages.

选择合适的平台需要了解当今市场上的三大主要类别:全代码解决方案、低代码解决方案和无代码解决方案。每种类别都满足不同的组织需求,并在灵活性、实施速度和所需技术专长之间各有优劣。

Selecting the right platform requires understanding the three main categories available in today’s market: full-code, low-code, and no-code solutions. Each category serves different organizational needs and comes with its own set of trade-offs between flexibility, speed of implementation, and required technical expertise.

无代码平台:快速部署和易用性

No-Code Platforms: Rapid Deployment and Accessibility

可以将无代码平台想象成人工智能代理开发的“乐高积木”。它们提供预构建的组件,无需任何技术专长即可轻松组装。Bizway、Beam、N8N 和 Relevance AI 等解决方案使业务用户能够通过直观的界面创建功能齐全的人工智能代理,而无需编写任何代码。这些平台非常适合需要快速部署基础到中等复杂程度代理的组织,或者希望在投资更复杂的解决方案之前验证人工智能代理价值的组织。

Think of no-code platforms as the “LEGO blocks” of AI agent development. They provide pre-built components that you can assemble without any technical expertise. Solutions like Bizway, Beam, N8N, and Relevance AI enable business users to create functional AI agents through intuitive interfaces without writing any code. These platforms are ideal for organizations that need to quickly implement basic to moderate complexity agents or those looking to prove the value of AI agents before investing in more sophisticated solutions.

虽然无代码平台与技术性更强的同类平台相比似乎功能有限,但它们通常可以通过配置而非编写代码提供令人惊讶的强大功能。例如,Relevance AI 通过用户友好的界面提供基于向量的处理和复杂的搜索功能等高级特性。

While no-code platforms might seem limited compared to their more technical counterparts, they often provide surprisingly sophisticated capabilities through configuration rather than coding. For example, Relevance AI offers advanced features like vector-based processing and sophisticated search capabilities through a user-friendly interface.

全代码平台:最大程度的控制和定制

Full-Code Platforms: Maximum Control and Customization

另一方面,全代码平台虽然提供了最高级别的控制和灵活性,但也需要大量的开发资源。这些平台,例如 LangGraph、CrewAI 和 AutoGen,提供了全面的框架,使组织能够构建高度定制化的 AI 代理。它们尤其适合那些需要严格控制数据、需要与现有系统进行复杂集成或在监管严格的行业中运营的企业。

On the other extreme side of the spectrum, full-code platforms provide the highest level of control and flexibility but require significant development resources. These platforms, such as LangGraph, CrewAI, and AutoGen, offer comprehensive frameworks that allow organizations to build highly customized AI agents. They’re particularly well-suited for enterprises that need to maintain strict control over their data, require complex integrations with existing systems, or operate in heavily regulated industries.

使用全代码平台时,企业可以精细调整人工智能代理行为的方方面面,从决策过程到数据处理协议。这种程度的控制是以更长的开发周期和对专业技术知识的需求为代价的。虽然像 LangGraph 这样的平台提供了丰富的文档和日益壮大的社区支持,但学习曲线仍然陡峭。

When working with full-code platforms, organizations can fine-tune every aspect of their AI agents’ behavior, from decision-making processes to data-handling protocols. This level of control comes at the cost of longer development cycles and the need for specialized technical expertise. While platforms like LangGraph offer extensive documentation and growing community support, the learning curve remains steep.

这些复杂的代理需要精心协调多种工具和复杂的流程管理,而全代码平台能够出色地处理这些任务。然而,企业应仔细考虑这种程度的控制对于其用例是否真的必要,因为额外的开发开销可能会显著延迟实施和价值实现时间。

These sophisticated agents require careful orchestration of multiple tools and complex workflow management, which full-code platforms handle adeptly. However, organizations should carefully consider whether this level of control is truly necessary for their use case, as the additional development overhead can significantly delay implementation and time to value.

低代码平台:平衡的方法

Low-Code Platforms: The Balanced Approach

低代码平台在定制化和易用性之间取得了平衡。WatsonX Assistant、Agentforce、UiPath 和 Microsoft Copilot 的 Agent Builders 等平台提供可视化开发环境,同时在需要时仍允许通过代码进行大量定制。这些平台擅长促进业务用户和技术团队之间的协作,因此对于以下类型的组织而言尤为有效:需要在保持一定程度的定制化的同时,快速部署人工智能代理。

Low-code platforms strike a balance between customization and accessibility. Platforms like WatsonX Assistant, Agentforce, and UiPath and Microsoft Copilot’s Agent Builders provide visual development environments while still allowing for significant customization through code when needed. These platforms excel at enabling collaboration between business users and technical teams, making them particularly effective for organizations that need to rapidly deploy AI agents while maintaining some level of customization.

低代码方法显著减少了实施所需的时间和技术要求,同时仍能为大多数企业用例提供足够的灵活性。这些平台通常提供预构建的组件和集成,可以根据特定的业务需求进行定制。例如,ServiceNow 的虚拟代理平台为常见的业务流程提供即用型组件,同时允许组织在需要时通过自定义开发扩展功能。

The low-code approach significantly reduces the time and technical expertise required for implementation while still providing enough flexibility for most enterprise use cases. These platforms typically offer pre-built components and integrations that can be customized to fit specific business needs. For instance, ServiceNow’s Virtual Agent platform provides ready-to-use components for common business processes while allowing organizations to extend functionality through custom development when needed.

选择时需要考虑的关键因素

Making the Selection: Key Considerations

为了做出自信高效的决策,与其无休止地比较功能,不如专注于构建一个实用的框架。首先,根据业务优先级,确定三到四个不可妥协的关键要素。为每个要素赋予权重,考虑易用性、可定制性、集成性、可扩展性、安全性以及成本等因素。然后,根据这些因素对潜在平台进行评分,选择评分最高的平台,并继续推进下一步。

To make a confident and efficient decision, focus on a practical framework rather than endless feature comparisons. Start by identifying your top three to four non-negotiables based on your business priorities. Assign weights to each criterion, considering factors like ease of use, customization, integration, scalability, security, and cost. Then, score potential platforms based on these factors, select the highest-ranking one, and move forward.

技术专长是否充足通常是首要的决定因素。拥有强大开发团队的组织自然会倾向于全代码解决方案。然而,我们观察到,即使是技术实力雄厚的组织,有时也能从低代码或无代码解决方案入手,从而更快地实现并积累实践经验,然后再转向更复杂的实施方案。

Technical expertise availability often serves as a primary deciding factor. Organizations with strong development teams might naturally gravitate toward full-code solutions. However, we’ve observed that even technically capable organizations sometimes benefit from starting with low-code or no-code solutions to implement faster and gain practical experience before moving to more complex implementations.

另一个关键考量因素是通过公民开发在整个组织内扩展代理开发规模的潜力。无代码和低代码平台支持一种民主化的方法,使组织内的员工都能参与构建和定制代理。这种代理开发的民主化可以显著加速人工智能计划的影响,并催生更多创新解决方案,因为了解特定技术的员工能够更好地参与其中。业务挑战可以直接促成解决方案的创建。例如,客服代表可以创建一个代理来处理日常咨询,而财务分析师可以开发一个代理来自动生成报告。这种分布式代理开发方法可以显著提升人工智能在组织内的价值和应用范围。

Another critical consideration is the potential for scaling agent development across the organization through citizen development. No-code and low-code platforms enable a democratized approach where employees throughout the organization can participate in building and customizing agents. This democratization of agent development can dramatically accelerate the impact of AI initiatives and lead to more innovative solutions, as employees who understand specific business challenges can directly contribute to creating solutions. For instance, a customer service representative might create an agent to handle routine inquiries, while a finance analyst might develop an agent to automate report generation. This distributed approach to agent development can significantly multiply the value and reach of AI within an organization.

与现有系统和数据源的集成需求在平台选择中也起着至关重要的作用。虽然大多数平台都提供 API 连接,但集成的便捷性和深度却差异很大。例如,Microsoft Copilot 的 Agent Builder 可与 Microsoft 生态系统无缝集成,因此对于那些大量投资于 Microsoft 技术的组织而言,它是一个极具吸引力的选择。

Integration requirements with existing systems and data sources also play a crucial role in platform selection. While most platforms offer API connectivity, the ease and depth of integration vary significantly. Microsoft Copilot’s Agent Builder, for instance, provides seamless integration with the Microsoft ecosystem, making it an attractive choice for organizations heavily invested in Microsoft technologies.

安全性和合规性要求会显著影响平台选择,尤其对于受监管行业的组织而言更是如此。全代码平台通常能提供对数据处理和安全协议的最大控制权,而无代码平台在这些方面可能存在局限性。

Security and compliance requirements can significantly influence platform choice, particularly for organizations in regulated industries. Full-code platforms typically offer the most control over data handling and security protocols, while no-code platforms might have limitations in these areas.

避免过度思考

Avoid Overthinking

没有哪个平台是完美的,一味等待“理想”方案可能会错失良机。最佳策略是从小规模做起,不断试验,边学边扩展。许多成功的公司都是从低代码平台上的简单人工智能代理入手,验证其效果,然后随着需求的演变逐步迁移到更高级的解决方案。

No platform is perfect, and waiting for the “ideal” choice can lead to lost opportunities. The best strategy is to start small, experiment, and scale as you learn. Many successful companies begin with a simple AI agent on a low-code platform, validate its impact, and gradually migrate to more advanced solutions as their needs evolve.

当你还在犹豫不决时,你的竞争对手已经开始部署人工智能代理,变革他们的行业。行动迅速的企业将引领人工智能革命,而犹豫不决的企业将难以追赶。最佳的人工智能战略并非完美无缺,而是你从今天开始实施的战略。

While you are still deciding, your competitors are already deploying AI agents that are transforming their industries. The businesses that act fast will lead the AI revolution, while those that hesitate will struggle to catch up. The best AI strategy is not the perfect one—it’s the one you start today.

***

***

在接下来的章节中,我们将结合自身经验,创建一份跨平台通用指南,重点介绍构建高效AI代理的关键成功因素、细致的决策以及真正发挥作用的隐藏功能。其优势在于您可以独立于所选平台使用此指南;然而,不足之处在于它并非针对特定平台,您可能会注意到不同平台在术语和功能细节方面存在一些差异。不过,为了便于说明,我们选择提供一份在低代码平台Relevance AI上构建代理的详细分步指南,您可以在本书的附录中找到该指南。

In the following sections, we’ve used our experience to create a universal guide that works across platforms, highlighting the critical success factors, nuanced decisions, and hidden features that truly make a difference in building effective AI agents. The advantage is that you can use it independently of the platform you choose; however, the drawback is that it is not platform-specific, and you might notice some variations in the terminology and details of functionalities. Yet, as an illustration, we have chosen to provide a detailed step-by-step guide for building an agent on the low-code platform Relevance AI, which you will find in the appendices of the book.

构建高效的AI代理:AGENT框架

Building Effective AI Agents: The A.G.E.N.T. Framework

根据我们的经验,我们发现成功往往不在于技术的复杂程度,而在于我们如何清晰、全面地定义和构建人工智能代理。经过多次实践,也经历了一些惨痛的教训,我们开发出了所谓的“代理框架”(AGENT framework)——一种构建可靠且真正能创造价值的人工智能代理的综合方法。

From our experience, we’ve learned that success often lies not in the sophistication of the technology but in the clarity and thoroughness of how we define and structure our AI agents. Through numerous implementations and, admittedly, some painful lessons, we’ve developed what we call the A.G.E.N.T. framework—a comprehensive approach to building reliable AI agents that actually deliver value.

该框架由五个关键组成部分构成:

The framework consists of five critical components:

代理人身份(代理人是谁?)

Agent Identity (Who is the agent?)

装备与大脑(是什么驱动着特工?)

Gear & Brain (What powers the agent?)

执行与工作流程(代理如何工作?)

Execution & Workflow (How does the agent work?)

导航与规则(智能体如何做出决策?)

Navigation & Rules (How does the agent make decisions?)

测试与信任(我们如何改进和扩展代理?)

Testing & Trust (How do we improve and scale the agent?)

将构建人工智能代理想象成招聘和培训新员工。你不会在没有明确定义其角色、建立其技能体系的情况下,就将某人招入你的组织。职责划分,以及建立合适的工作流程、工具和规则。同样的原则也适用于人工智能代理,而且更为重要,因为这些数字员工需要极其精确的指令才能高效运作。

Think of building an AI agent as hiring and training a new employee. You wouldn’t bring someone into your organization without clearly defining their role, establishing their responsibilities, and setting up the proper workflows, tools, and rules. The same principle applies to AI agents, but it’s even more important because these digital workers require extremely precise instructions to function effectively.

让我们深入了解该框架的第一个关键组成部分:代理身份。

Let’s dive deep into the first crucial component of the framework: Agent Identity.

A – 代理人身份:代理人是谁?

A – Agent Identity: Who is the Agent?

构建人工智能代理时,最关键的一步是定义它的身份——它的用途、角色和运行范围。而这正是大多数人犯的第一个错误。他们急于求成,渴望看到人工智能的实际运行效果,却忽略了仔细定义代理应该做什么。结果呢?人工智能的行为难以预测,结果前后矛盾,或者根本无法创造价值。

The single most important step when building an AI agent is defining its identity—its purpose, role, and operational scope. This is where most people make their first mistake. They rush ahead, eager to see their AI in action, without taking the time to carefully define what the agent is supposed to do. The result? An AI that behaves unpredictably, produces inconsistent results, or simply fails to deliver value.

为了说明这一步骤为何如此关键,不妨想象一下招聘新员工。你会把人招进公司后说:“你看着办吧,怎么干都行”吗?当然不会。新员工需要明确的指导:他们的角色职责权限限制。人工智能代理也是如此。如果不给它们设定结构化的身份,它们就会漫无目的地游荡,最终产生的结果往往不可靠、无关紧要,甚至适得其反。

To illustrate why this step is so crucial, imagine hiring a new employee. Would you bring someone into your company and say, “Just help out however you think best”? Of course not. A new hire needs clear guidance: their role, their responsibilities, and their limitations. The same applies to AI agents. If you don’t give them a structured identity, they’ll wander aimlessly, often producing results that are unreliable, irrelevant, or even counterproductive.

为什么定义代理人的身份至关重要

Why Defining an Agent’s Identity is Critical

人工智能体并非直觉型思考者,它们没有直觉或常识,只能在你设定的约束条件下运行。如果你未能正确定义它们的身份,将会面临以下三个方面的后果:

AI agents are not intuitive thinkers. They don’t have gut feelings or common sense. They operate within the constraints you set for them. If you fail to define their identity properly, you will see the consequences in three ways:

1.行为不可靠——由于缺乏方向,代理可能会做出与您的目标不符的回应。

1. Unreliable Behavior—The agent may respond in ways that don’t align with your goals because it lacks direction.

2.不一致——同一个问题每次都可能产生不同的结果,因为代理人没有明确的指导方针。

2. Inconsistency—The same question may yield different results each time because the agent doesn’t have clear guidelines.

3.缺乏控制——如果一个代理的范围没有明确定义,它可能会产生不必要的、无关的,甚至是有害的输出。

3. Lack of Control—If an agent’s scope isn’t clearly defined, it might produce unnecessary, irrelevant, or even harmful outputs.

为了更好地理解这一点,不妨看看我们为新闻简报构建人工智能摘要助手的过程。最初,我们给助手下了一个简单的指令:

To see this in action, consider our journey in building an AI-powered summarization assistant for our newsletter. Initially, we gave the agent a simple directive:

“总结新闻文章和研究论文。”

“Summarize news articles and research papers.”

听起来合情合理,对吧?但结果却一团糟。人工智能随机抓取文章,毫无章法地进行总结,有时甚至还包含过时或无关的来源。这并非人工智能的错——它只是在执行模糊的指令。

Sounds reasonable, right? But the results were a mess. The AI pulled random articles, summarized them with no clear structure, and sometimes even included outdated or irrelevant sources. It wasn’t the AI’s fault—it was just following vague instructions.

因此,我们完善了代理的身份,并赋予其更精确的角色:

So, we refined the agent’s identity and gave it a more precise role:

“您是一位专注于人工智能和商业趋势的摘要助理。您的工作是从MIT Tech Review、arXiv和Harvard Business Review等来源获取新闻文章和研究论文,并将其提炼成清晰、引人入胜的摘要。每篇摘要必须少于150字,抓住关键要点,并保持原文含义。摘要的语气应专业而通俗易懂,符合我们新闻通讯的风格。”

“You are a summarization assistant specializing in AI and business trends. Your job is to take news articles and research papers from sources like MIT Tech Review, arXiv, and Harvard Business Review and distill them into clear, engaging summaries. Each summary must be under 150 words, capture key insights, and maintain the original meaning. The tone should be professional yet accessible, aligning with our newsletter’s style.”

凭借这种清晰度,我们的人工智能从一个不可靠的内容聚合器转变为一个精准的研究伙伴。它现在提供简洁、高价值的摘要,让我们的读者能够掌握最新动态,而无需我们费力筛选海量信息。您可以参考附录,其中提供了该代理人的详细身份信息。您可以将其用作模板来撰写您自己的摘要。

With this level of clarity, our AI went from an unreliable content aggregator to a laser-focused research partner. It now delivers concise, high-value summaries that keep our readers ahead of the curve without us having to sift through mountains of information. You can refer to the appendix, where we provide you with this agent’s detailed identity. You can use it as a template for writing your own.

构建强大的AI代理:像管理者一样思考

Building a Strong AI Agent: Think Like a Manager

为了更直观地理解这一点,不妨想象一下管理一位无法独立思考、只会服从指令的远程员工。你需要:

To put this all into perspective, imagine managing a remote employee who can’t think independently—they only follow instructions. You’d need to:

1.给他们一份清晰的职位描述

1. Give them a clear job description

2.明确告诉他们应该(和不应该)做哪些任务。

2. Tell them exactly what tasks they should (and shouldn’t) do

3.明确他们应该如何沟通

3. Define how they should communicate

4.设定界限和升级点

4. Set boundaries and escalation points

现在,用你的AI代理替换掉那位员工。如果你不定义这些要素,你的AI就会不可靠。但如果你定义了这些要素,你就能拥有一个在其领域内始终如一、高效且智能地​​运行的代理。

Now, replace that employee with your AI agent. If you don’t define these things, your AI will be unreliable. But if you do, you’ll have an agent that performs consistently, efficiently, and intelligently within its domain.

在选择模型和工作流程之前,请务必花时间明确您的智能体的身份。这个基础越完善,您的人工智能智能体就会越强大、越高效。

Before you move forward to choosing models and workflows, take the time to write down your agent’s identity. The better this foundation, the more powerful and effective your AI agent will be.

强大代理身份的关键要素

The Key Elements of a Strong Agent Identity

定义人工智能代理的身份不仅仅是描述其功能,它需要精准性,就像撰写一份精心设计的职位描述一样。

Defining an AI agent’s identity is more than just stating its function. It requires precision, just like writing a well-crafted job description.

1. 目的:特工的主要任务是什么?

1. Purpose: What is the Agent’s Primary Mission?

每个人工智能体都应该有一个清晰的使命宣言,回答“这个智能体存在的意义是什么?”这个问题。一个强有力的使命宣言不会留下任何歧义。

Every AI agent should have a clear mission statement that answers the question: Why does this agent exist? A strong purpose statement leaves no room for ambiguity.

目标不够明确:“协助客户支持。”

Weak Purpose: “Help with customer support.”

明确的目标:“以专业、友好的方式,通过循序渐进的解决方案帮助客户解决常见的技术问题。”

Strong Purpose: “Assist customers by resolving common technical issues with step-by-step solutions in a professional, friendly manner.”

目标越明确,代理的输出就越符合用户的期望。

The stronger the purpose, the more aligned the agent’s outputs will be with user expectations.

2. 角色:代理人扮演什么角色?

2. Role: What Persona Does the Agent Assume?

智能体的角色定义了它与用户的交互方式以及它所模拟的专业知识——本质上,就是它的专业身份。明确的角色定义能够确保用户快速理解其功能和局限性,同时也为智能体的行为范围划定了清晰的界限。

An agent’s role defines how it interacts with users and the expertise it simulates—essentially, its professional identity. A well-defined role ensures users instantly grasp its capabilities and limitations, while also providing clear boundaries for what the agent should and shouldn’t do.

例如,金融人工智能代理可能是:

For example, a financial AI agent might be a:

“财务分析师”(提供投资见解和风险评估)

“Financial Analyst” (providing investment insights and risk assessments)

“个人理财教练”(帮助用户制定预算和省钱)

“Personal Finance Coach” (helping users budget and save money)

“税务顾问”(以合规为原则解答税务相关问题)

“Tax Consultant” (answering tax-related questions with compliance in mind)

这些角色对人工智能的沟通方式和关注的信息类型都有不同的要求。

Each of these roles will require the AI to communicate differently and focus on different kinds of information.

3. 范围:代理人可以做什么,应该避免什么?

3. Scope: What Can the Agent Do, and What Should It Avoid?

如果没有明确的界限,人工智能代理可能会偏离其预期功能。设定明确的范围限制可以确保它始终专注于此。

Without clear boundaries, an AI agent may drift beyond its intended function. Setting well-defined scope constraints ensures it stays focused.

例如,客服人员的职责可能仅限于:

For example, a customer service agent might be restricted to:

根据公司知识回答常见问题

Answering FAQs based on company knowledge

提供常见问题的故障排除步骤

Providing troubleshooting steps for common problems

将复杂问题上报给人类代表

Escalating complex issues to human representatives

它不应该尝试:

It should not attempt to:

提供法律或医疗建议

Provide legal or medical advice

做出未经授权的决定

Make unauthorized decisions

生成推测性回应

Generate speculative responses

界限可以防止代理人涉足那些犯错可能代价高昂的领域。

Boundaries prevent the agent from venturing into areas where mistakes could be costly.

G – 齿轮与大脑:为您的 AI 代理提供动力

G – Gear & Brain: Powering Your AI Agent

一旦你的AI代理拥有了清晰的身份,下一步就是为其配备合适的装备和大脑——也就是有效运行所需的工具、模型和知识。很多开发者正是在这里犯错。他们要么因为选择了不必要的复杂配置而使事情变得过于复杂,要么因为选择了不合适的工具而导致代理功能不足。

Once your AI agent has a clearly defined identity, the next step is equipping it with the right gear and brain—the tools, models, and knowledge it needs to function effectively. This is where many builders go wrong. They either overcomplicate things by choosing advanced setups they don’t need or underpower their agents by selecting the wrong tools for the job.

把这一步骤想象成组装一辆高性能汽车。你不会把赛车引擎装进送货车里,也不会指望家用轿车赢得一级方程式赛车。同样的道理也适用于人工智能代理——选择错误的人工智能模型、工具和数据源组合会导致效率低下、性能不佳,甚至彻底失败。

Think of this step as assembling a high-performance vehicle. You wouldn’t put a racing engine in a delivery van or expect a family sedan to win Formula 1. The same logic applies to AI agents—choosing the wrong combination of AI models, tools, and data sources will lead to inefficiency, poor performance, or outright failure.

1. 选择合适的AI模型:平衡性能、成本和效率

1. Selecting the Right AI Model: Balancing Power, Cost, and Efficiency

大多数平台都允许您选择智能体使用的AI模型(或模型组合)。这个决定至关重要——它决定了智能体的思考方式、响应方式和信息处理方式。AI模型是智能体的大脑,决定了智能体理解、推理和生成响应的能力。需要考虑两个关键的权衡因素:模型大小和推理能力。

Most platforms let you choose which AI model (or combination of models) your agent will use. This decision is critical—it determines how your agent thinks, responds, and processes information. The AI model is the brain of the agent, dictating how well it understands, reasons, and generates responses. There are two key trade-offs to consider: model size and reasoning capability.

较小的型号,例如 Mini、Phi 或 Flash,速度快、效率高且经济实惠,因此非常适合简单的查询或它们擅长处理预定义任务,但在深度推理和复杂问题解决方面却表现不佳。另一方面,像GPT、Claude Opus或Gemini Ultra这样的大型模型能够分析细致入微的信息,生成高质量的答案,并进行批判性思考——但它们消耗更多资源,运行成本也更高。

Smaller models, such as Mini, Phi, or Flash, are fast, efficient, and cost-effective, making them ideal for simple queries or predefined tasks. However, they struggle with deep reasoning and complex problem-solving. On the other hand, larger models like GPT, Claude Opus, or Gemini Ultra can analyze nuanced information, generate high-quality responses, and think critically—but they consume more resources and cost more to operate.

对于成本敏感型应用,通常最佳方案是使用小型模型处理日常任务,而将大型模型用于复杂查询。像 Mistral 或 Llama 70B 这样的开源模型虽然控制性更强、成本更低,但需要内部专业人员进行微调和部署。

For cost-sensitive applications, using a small model for routine tasks and reserving a larger model for complex queries is often the best approach. Open-source models like Mistral or Llama 70B provide more control and lower costs but require in-house expertise to fine-tune and deploy.

温度设置

Temperature settings

调整人工智能代理性能的另一个重要因素是温度设置,许多平台都提供此功能。温度设置控制着人工智能响应的确定性和创造性,从而影响代理的可靠性和灵活性。

Another important factor in tuning your AI agent’s performance is adjusting the temperature settings, a feature available on many platforms. The temperature setting controls how deterministic or creative the AI’s responses will be, impacting both the agent’s reliability and flexibility.

较低的温度(例如 0 到 0.3)能使人工智能更可预测、更稳定,更倾向于给出基于事实的答案,并最大限度地减少随机性。这对于准确性至关重要的应用场景非常理想,例如新闻简报摘要生成器、研究助手或法律人工智能工具。您希望人工智能生成简洁、基于事实的摘要,而不是产生语气或内容上意料之外的变化。

A low temperature (e.g., 0 to 0.3) makes the AI more predictable and consistent, sticking closely to factual answers and minimizing randomness. This is ideal for applications where accuracy is crucial, such as summarization agents for newsletters, research assistants, or legal AI tools. You want the AI to generate concise, fact-based summaries rather than producing unexpected variations in tone or content.

较高的温度(例如 0.7 到 1.0)会使人工智能更具创造力和开放性,从而在措辞和回复方面引入更多变化。这种设置适用于头脑风暴、生成营销文案或撰写创意内容,在这些场景中,原创性比严格的准确性更为重要。然而,在像摘要代理这样的结构化环境中,较高的温度可能会导致前后矛盾、出现幻觉或输出过于冗长,从而偏离原文含义。

A high temperature (e.g., 0.7 to 1.0) makes the AI more creative and open-ended, introducing variety in phrasing and responses. This setting is useful for brainstorming, generating marketing copy, or writing creative content, where originality is more important than strict accuracy. However, in structured environments like a summarization agent, a high temperature can lead to inconsistencies, hallucinations, or overly wordy outputs that stray from the original meaning.

为了更深入地了解如何为人工智能代理选择合适的人工智能模型,可以参考 Hugging Face 模型对比(针对代理)<sup>137 </sup>以及 OpenAI、Anthropic 或 DeepMind 等公司的人工智能研究论文,这些资源可以提供有价值的基准。<sup> 138</sup>

For deeper insights into AI model selection for AI agents, resources like Hugging Face model comparisons for agents137 and AI research papers from OpenAI, Anthropic, or DeepMind can provide valuable benchmarks.138

2. 选择合适的工具:精准胜于蛮力

2. Choosing the Right Tools: Precision Over Power

人工智能代理并非文本生成器——它必须与外部系统交互、获取实时数据并采取行动。正如公司不会给予员工不受限制的系统访问权限一样,人工智能代理的工具集也必须明确定义、加以限制并受到监控。

An AI agent isn’t a text generator—it must interact with external systems, retrieve real-time data, and take action. Just as a company wouldn’t give an employee unrestricted access to its systems, an AI agent’s toolset must be clearly defined, limited, and monitored.

在我们的新闻通讯项目中,我们最初赋予了搜索代理对网络爬虫工具的无限制访问权限。结果导致大量来自不可靠来源的内容涌入,以及不堪重负的网站偶尔造成的服务器阻塞。我们由此认识到,详细定义每个工具的用途、限制和使用参数至关重要。对于我们的搜索代理而言,这意味着不仅要指定它可以使用哪些 API,还要具体说明如何使用。您可以参考附录,其中提供了我们代理的详细工具规范。您可以将其作为模板来编写您自己的规范。

During our newsletter project, we initially gave our Search Agent unrestricted access to web scraping tools. The result was a flood of content from unreliable sources and occasional server blocks from overwhelmed websites. We learned the importance of defining each tool’s purpose, limitations, and usage parameters in detail. For our Search Agent, this meant specifying not just which APIs it can use, but exactly how. You can refer to the appendix, where we provide you with the detailed tool specifications of our agent. You can use it as a template for writing your own.

API 是最常用的工具,它允许代理获取实时信息或执行操作。例如,金融代理可以使用市场数据 API 获取股票价格,而日程安排助手则依赖日历 API。然而,不受限制的 API 访问可能会导致速率限制、安全漏洞或法律问题——许多网络爬虫代理就因为过度抓取而被封禁或封号。

APIs are the most common tools, allowing agents to fetch real-time information or execute actions. A financial agent, for example, might use a market data API to pull stock prices, while a scheduling assistant relies on a calendar API. However, unrestricted API access can lead to rate limits, security vulnerabilities, or legal issues—as was the case with many web-scraping agents that ended up blocked or banned for excessive crawling.

为防止滥用,代理商需要制定结构化的使用策略,明确定义操作边界。应指定以下几个关键参数:

To prevent misuse, agents need structured usage policies that define clear operational boundaries. Several key parameters should be specified:

速率限制与成本控制——为确保效率和成本控制,代理必须在设定的每分钟请求次数限制内运行。许多 API 按请求收费,这意味着不受控制的使用会导致高昂的成本。此外,超出提供商设定的限制可能会触发限流,API 会减慢速度或暂时阻止请求以防止系统过载。通过管理请求频率,代理可以避免不必要的费用,同时保持流畅不间断的性能。

Rate Limits & Cost Control—To ensure efficiency and cost control, the agent must operate within a set request limit per minute. Many APIs charge per request, meaning uncontrolled usage can lead to high costs. Additionally, exceeding provider-imposed limits can trigger throttling, where the API slows down or temporarily blocks requests to prevent system overload. By managing request frequency, the agent avoids unnecessary expenses while maintaining smooth and uninterrupted performance.

信息来源可靠性——为了防止信息误导,代理程序应仅从预先批准的可靠来源提取数据。这可能包括将可信的新闻媒体列入白名单,或根据质量标准过滤内容。例如,我们的研究代理程序被编程为仅从预先批准的信誉良好的来源(例如 MIT Tech Review、arXiv 或 Harvard Business Review)提取信息。

Source Reliability—The agent should only pull data from pre-approved, credible sources to prevent misinformation. This might include whitelisting trusted news outlets or filtering content based on quality standards. For example, our research agent was programmed to pull information only from pre-approved, reputable sources such as MIT Tech Review, arXiv, or Harvard Business Review.

安全的 API 访问——API 密钥应定期轮换,并使用身份验证机制(例如,基于令牌的访问控制)进行保护,以防止未经授权的使用。密钥绝不能硬编码到脚本中。

Secure API Access—API keys should be rotated regularly and protected with authentication mechanisms (e.g., token-based access) to prevent unauthorized use. They should never be hardcoded into scripts.

熔断机制——如果连续出现故障,则停止执行,而不是继续发送错误的请求。例如,如果 API 连续失败三次,代理应暂停并通知管理员,而不是无限期地重试。

Circuit Breakers—Stop execution if repeated failures occur, rather than continuing to send faulty requests. For example, if an API fails three times in a row, the agent should pause and notify an administrator instead of retrying indefinitely.

备用系统——当出现故障时,切换到备用流程以确保业务连续性。例如,如果新闻 API 出现故障,人工智能可以从备用来源获取新闻标题,而不是返回错误。

Fallback Systems—Ensure continuity by switching to a backup process when something goes wrong. For example, if a news API is down, the AI can pull headlines from a secondary source instead of returning an error.

如果没有这些安全机制,人工智能代理可能会超负荷运行,令用户感到沮丧,甚至损害业务运营。如果您想进一步探索人工智能工具集成,可以参考本书第二部分。您还可以参考OpenAI 、 Google和 AWS的API 文档这些文档为负责任的实施提供了极佳的指导。

Without these safety nets, an AI agent can overload itself, frustrate users, and even damage business operations. For those looking to explore AI tool integrations further, you can refer to Part 2 of this book. You can also refer to API documentation from OpenAI,139 Google,140 and AWS,141 which provides excellent guidance on responsible implementation.

选择知识来源:可靠人工智能代理的基础

Selecting Knowledge Sources: The Foundation of Reliable AI Agent

人工智能体必须从可靠来源检索和处理信息。如果这些来源定义不明确,人工智能体可能会产生幻觉、提取错误数据或提供误导性响应。构建人工智能体知识库主要有三种方式:数据库、应用程序接口(API)和文档嵌入。

An AI agent must retrieve and process information from reliable sources. If these sources are poorly defined, the agent may hallucinate, pull incorrect data, or provide misleading responses. There are three primary ways to structure an agent’s knowledge base: databases, APIs, and document embeddings.

数据库最适合存储内部知识,例如客户历史记录、政策或专有研究成果。而应用程序接口 (API) 则确保代理能够获取实时数据,例如天气预报、法律更新或财务统计数据。同时,文档嵌入技术使人工智能能够从大量文本中搜索和检索信息,因此非常适合法律、学术和企业级人工智能解决方案。

A database is best for internal knowledge—such as customer histories, policies, or proprietary research. APIs, on the other hand, ensure the agent can fetch live data, such as weather forecasts, legal updates, or financial statistics. Meanwhile, document embeddings allow AI to search and retrieve information from large collections of text, making them ideal for legal, academic, and enterprise AI solutions.

然而,并非所有知识都应该被纳入考虑。如果代理人能够获取过时或带有偏见的信息来源,则存在传播错误信息的风险。同样,未经筛选的开放式网络搜索也可能导致错误信息传播。这样做很危险,因为它可能导致检索到低质量或误导性的内容。相反,人工智能代理应该使用可信的、特定领域的数据集进行训练,并限制其生成推测性信息。

However, not all knowledge should be included. If the agent has access to outdated or biased sources, it risks spreading misinformation. Similarly, unfiltered open-web search can be dangerous, as it may lead to the retrieval of low-quality or misleading content. Instead, AI agents should be trained on trusted, domain-specific datasets and restricted from generating speculative information.

对改进知识选择策略感兴趣的读者可以参考本书第二部分。此外,还可以参考诸如 Weaviate<sup> 142</sup>或 Pinecone<sup> 143</sup>等向量数据库平台,以及斯坦福人工智能实验室<sup> 144</sup>的学术研究,这些研究提供了更深入的见解。

Those interested in refining knowledge selection strategies can refer to Part 2 of this book. You can also refer to vector database platforms like Weaviate142 or Pinecone,143 as well as academic research from Stanford AI Lab,144 which provide advanced insights.

最终建议:构建更智能而非更费力的AI

Final Recommendations: Building AI That Works Smarter, Not Harder

根据我们的经验,最好的AI代理并非拥有最强大的能力,而是智能、工具和知识达到最佳平衡的代理。保持模型高效运行、限制工具访问权限以及精心挑选高质量的知识来源,是确保可靠性、安全性和可信度的关键。

Based on our experience, the best AI agents aren’t the ones with the most power, but the ones with the right balance of intelligence, tools, and knowledge. Keeping models efficient, restricting tool access, and curating high-quality knowledge sources ensures reliability, security, and trustworthiness.

配置不当的人工智能充其量是浪费资源,最坏的情况是危险的。关键在于从清晰的定义、严格的参数和可靠的信息来源入手,确保智能体在可控且可预测的范围内运行。

Poorly configured AI is wasteful at best and dangerous at worst. The key is to start with clear definitions, strict parameters, and reliable sources, ensuring the agent operates within controlled and predictable boundaries.

E – 执行与工作流程(代理如何工作?)

E – Execution & Workflow (How does the agent work?)

一旦你的 AI 代理拥有了明确的身份,并配备了合适的装备和大脑,就该专注于执行——代理在现实世界中实际如何运作。

Once your AI agent has a clearly defined identity and is equipped with the right gear and brain, it’s time to focus on execution—how the agent actually operates in real-world conditions.

许多人工智能项目失败并非因为模型不好或缺少工具,而是因为其工作流程设计不佳。缺乏清晰的结构,人工智能就可能出现行为不可预测的情况,难以处理输入数据。错误地或做出前后不一致的响应。就像训练有素的员工遵循结构化的工作流程一样,人工智能代理也需要明确的输入格式、逻辑清晰的工作流程以及明确的触发条件,以决定其何时以及如何运行。

Many AI projects fail not because of bad models or missing tools, but because their workflows are poorly designed. Without a clear structure, it risks acting unpredictably, handling inputs incorrectly, or responding inconsistently. Just like a well-trained employee follows a structured work process, an AI agent needs a defined input format, a logical workflow, and clear triggers that dictate when and how it operates.

定义输入和输出:使用正确的语言

Defining Input & Output: Speaking the Right Language

人工智能代理要正常运行,必须先理解它接收和生成的数据的格式。

Before an AI agent can function properly, it must understand the format of the data it receives and produces.

输入和输出的精确定义看似显而易见,但实际上却是代理设计中最关键的方面之一。我们的新闻通讯系统就给我们上了这一课。最初,我们对摘要代理如何从搜索代理接收内容的定义非常笼统。结果呢?数据格式不匹配、元数据缺失、内容结构不一致,导致错误不断。

The precise definition of inputs and outputs might seem obvious, but it’s actually one of the most crucial aspects of agent design. We learned this lesson with our newsletter system. Initially, we had loosely defined how our Summarization Agent should receive content from the Search Agent. The result? Constant errors from mismatched data formats, missing metadata, and inconsistent content structures.

想想如果我们没有正确指定文章 URL 的输入格式会发生什么。有些文章 URL 带有查询参数,有些则没有;有些包含锚文本标签,有些则是纯 URL。这种看似微小的疏忽导致我们的系统有时会多次处理同一篇文章,甚至完全漏掉文章。通过定义精确的输入规范——包括 URL 格式、必需的元数据字段和内容结构——我们彻底解决了这些问题。

Consider what happened when we didn’t properly specify the input format for article URLs. Some came with query parameters, others without; some included anchor tags, and others were clean URLs. This seemingly minor oversight caused our system to sometimes process the same article multiple times or miss articles entirely. By defining exact input specifications—including URL formatting, required metadata fields, and content structure—we eliminated these issues entirely.

同样的原则也适用于输出。我们最初推出摘要代理时,它生成的摘要有时会过长,超出邮件模板的限制,导致我们需要在最后时刻进行手动编辑。通过指定精确的输出参数——字符限制、必填部分和格式规则——我们确保了每个代理的输出都能无缝衔接到流程的下一阶段。

The same principle applies to outputs. When we first launched, our Summarization Agent would sometimes produce summaries that were too long for our email template, forcing last-minute manual editing. By specifying exact output parameters—character limits, required sections, and formatting rules—we ensured that each agent’s output seamlessly fed into the next stage of the process.

为防止此类问题,每个人工智能代理都应该具备:

To prevent such issues, every AI agent should have:

严格的输入验证:确保代理程序只能处理其设计支持的格式。

Strict input validation: Ensure the agent can process only the formats it’s designed for.

明确的输出定义:接收系统应该始终知道预期结果。

Clear output definitions: The receiving system should always know what to expect.

您可以参考附录,其中我们为您提供了此类详细示例。

You can refer to the appendix, where we provide you with such detailed examples.

对于那些处理海量或关键数据的用户来说,像用于结构化数据处理的 OpenAPI 框架<sup>145</sup>或用于验证的 JSON Schema 框架<sup> 146</sup>可以帮助标准化人工智能工作流程中的输入和输出。

For those dealing with high-volume or critical data processing, frameworks like OpenAPI for structured data handling145 or JSON Schema for validation can help standardize inputs and outputs across AI workflows.146

设计工作流程:构建人工智能的决策过程

Designing the Workflow: Structuring the AI’s Decision-Making Process

人工智能代理并非独立运行——它被激活后,会按照一系列步骤处理信息并做出决策。一个定义完善的工作流程不仅规定了步骤,还规定了步骤之间的转换。如果没有定义明确的工作流程,它可能会陷入不必要的循环,在错误的时间执行操作,或者产生不一致的行为。

An AI agent doesn’t operate in isolation—it is activated and then follows a sequence of steps to process information and make decisions. A properly defined workflow specifies not just the steps, but the transitions between them. Without a defined workflow, it may loop unnecessarily, execute actions at the wrong time, or produce inconsistent behavior.

一个结构良好的工作流程包括:

A well-structured workflow includes:

1.激活条件:代理何时开始工作?它是等待用户输入、API 调用还是计划任务?主要有三种触发类型:

1. Activation Criteria: When does the agent start working? Does it wait for user input, an API call, or a scheduled task? There are three primary types of triggers:

用户输入:代理仅在被提示时才会做出响应(例如,聊天机器人、虚拟助手)。

User Input: The agent responds only when prompted (e.g., chatbots, virtual assistants).

API 调用:当另一个系统请求数据时(例如,AI 驱动的自动化),这些调用会被激活。

API Calls: These activate when another system requests data (e.g., AI-driven automation).

计划执行:按固定时间间隔运行(例如,每日报告、后台数据处理)。

Scheduled Execution: It runs at fixed intervals (e.g., daily reports, background data processing).

2. 处理步骤:智能体采取哪些操作顺序?例如,它是先获取数据,分析数据,然后生成响应吗?它是否需要在采取行动之前验证信息?

2. Processing Steps: What sequence of actions does the agent take? For example, does it fetch data first, analyze it, then generate a response? Does it need to verify information before taking action?

例如,我们新闻简报的摘要代理遵循以下工作流程:

For example, our Summarization Agent for our newsletter follows this workflow:

1.激活条件:研究代理从 MIT Tech Review 和 arXiv 研究论文等来源上传或检索一批新的文章和研究论文。

1. Activation Criteria: A new batch of articles and research papers is uploaded or retrieved from sources like MIT Tech Review and arXiv research papers by the Research Agent.

2.处理步骤:代理处理内容,提取关键见解并过滤掉不相关或重复的信息。

2. Processing Steps: The agent processes the content, extracting key insights and filtering out irrelevant or duplicate information.

完善的工作流程可以防止效率低下,并确保代理以合乎逻辑且可预测的方式运行。

A well-defined workflow prevents inefficiencies and ensures the agent operates logically and predictably.

要构建强大且可扩展的人工智能,精心设计执行流程与选择合适的模型同等重要。为了获得更深入的了解,开发人员可以探索 OpenAI 的 API 最佳实践<sup> 147</sup> 、 BPMN 的工作流自动化策略<sup> 148</sup>以及 AWS Lambda<sup> 149</sup>和 Google Cloud Functions <sup>150</sup>等云计算框架中使用的事件驱动架构。

To build a robust, scalable AI, structuring execution carefully is just as important as selecting the right model. For deeper insights, developers can explore API best practices from OpenAI,147 workflow automation strategies from BPMN,148 and event-driven architectures used in cloud computing frameworks like AWS Lambda149 and Google Cloud Functions.150

实施故障安全机制:确保人工智能在压力下的可靠性

Implementing Fail-Safes: Ensuring AI Reliability Under Stress

任何人工智能系统都无法完美运行。故障——无论是由于代理程序故障、性能下降还是外部干扰——都不可避免。真正的挑战不在于避免故障,而在于有效管理故障,从而最大限度地减少损失并维持服务的连续性。

No AI system operates flawlessly. Failures—whether due to malfunctioning agents, degraded performance, or external disruptions—are inevitable. The real challenge is not avoiding failures but managing them effectively to minimize damage and maintain service continuity.

通过实施熔断机制和结构化错误处理,人工智能代理能够及早发现问题,防止级联故障,并确保优雅恢复。这些故障保护措施如同系统的免疫系统,既能保障性能,又能维护用户信任。

By implementing circuit breakers and structured error handling, AI agents can detect issues early, prevent cascading failures, and ensure graceful recovery. These fail-safes act as the system’s immune system, protecting both performance and user trust.

1. 错误处理和恢复:保持人工智能功能正常

1. Error Handling and Recovery: Keeping AI Functional

人工智能必须被设计成能够智能恢复。最好的系统不会直接崩溃,而是采用三步恢复方法:

AI must be designed to recover intelligently. Instead of failing outright, the best systems use a three-step recovery approach:

1.自动恢复:系统会自动重试失败的进程,每次尝试之间等待更长时间,以避免过载,同时增加成功的机会。

1. Automated Recovery: The system automatically retries failed processes, waiting longer between each attempt to avoid overload while increasing the chances of success.

2.优雅降级:如果某个功能反复出现故障,AI 会降低其复杂度,而不是完全停止运行。例如,如果摘要代理遇到错误,它仍然可以转发文章链接和元数据,而不是完全停止工作。

2. Graceful Degradation: If a function fails repeatedly, the AI reduces complexity instead of stopping entirely. For example, if a summarization agent encounters errors, it can still forward article links and metadata rather than fail completely.

3.人工介入升级:如果人工智能无法自信地解决问题,它会标记人类,提供相关的背景信息和建议的解决方案,而不是简单地失败。

3. Human Escalation: If the AI can’t resolve an issue with confidence, it flags a human, providing relevant context and suggested resolutions rather than simply failing.

这种结构化的方法确保人工智能故障得到控制,用户仍然能够获得有用的输出,关键服务即使在压力下也能继续运行。

This structured approach ensures that AI failures remain controlled, users still receive useful outputs, and critical services continue running even under stress.

2. 断路器:防止系统性故障

2. Circuit Breakers: Preventing System-Wide Failures

如果错误持续存在,断路器就像人工智能的免疫系统,会在故障造成大范围损害之前将其阻止。如果没有断路器,智能体可能会陷入错误循环、处理不可靠的数据,或者在无人干预的情况下质量下降。

If the errors persist, circuit breakers act as an AI’s immune system, stopping faulty processes before they cause widespread damage. Without them, agents can loop on errors, process unreliable data, or degrade in quality without intervention.

我们曾因摘要生成器故障导致两个小时生成无意义的摘要而吸取教训:一个被篡改的新闻源注入了误导性内容;以及重大事件期间系统过载导致摘要质量低下。为了避免这些问题,我们设定了质量阈值——如果摘要生成器在一分钟内生成三次无意义的摘要,系统将自动暂停并发送消息通知人工处理。

We learned this the hard way when a malfunctioning summarization agent produced gibberish for two hours, a compromised news source injected misleading content, and system overload during a major event resulted in poor-quality summaries. To prevent these issues, we set quality thresholds—if a summarization agent produces gibberish three times in a minute, it is paused automatically and escalated to a human by sending a message.

在一次大型科技会议期间,我们的系统不堪重负,导致内容质量下降。现在,熔断机制能够检测并阻止低质量内容在到达用户之前发布,从而避免声誉受损。AutoGen 和 CrewAI 等平台提供内置的安全保障措施,可以配置为在出现异常情况时停止执行。

During a major tech conference, our system became overwhelmed, degrading content quality. Circuit breakers now detect and block low-quality outputs before they reach users, preventing reputational damage. Platforms like AutoGen and CrewAI offer built-in safeguards that can be configured to stop execution when anomalies occur.

3. 向人类求助:何时人工智能应该退居幕后

3. Escalation to a Human: Knowing When AI Should Step Back

有些决策过于复杂或风险过高,人工智能无法独立处理。在这种情况下,将问题上报给人工处理对于确保准确性、合规性和信任至关重要。精心设计的人工智能不应盲目地将问题转交给人工处理,而应提供结构化的升级流程。

Some decisions are too complex or high-risk for AI to handle alone. In these cases, escalating to a human is critical to maintaining accuracy, compliance, and trust. Instead of blindly handing off the problem, a well-designed AI should provide structured escalation.

例如,在新闻简报的摘要生成代理中,当人工智能检测到相互矛盾或置信度较低的摘要时,就会触发升级机制。如果多个来源报道了同一事件,也会触发升级机制。但如果提供相互矛盾的细节——例如公司盈利数据不一致或产品发布信息前后矛盾——摘要代理不会尝试“猜测”真相。相反,它会将问题标记给人工编辑,并提供以下信息:

For example, in the summarization agent for the newsletter, escalation occurs when the AI detects conflicting or low-confidence summaries. If multiple sources report the same event but provide contradictory details—such as differing figures on a company’s earnings or inconsistent claims about a product launch—the summarization agent does not attempt to “guess” the truth. Instead, it flags the issue to a human editor, providing:

1.为本摘要提供依据的原始文章。

1. The original source articles that contributed to the summary.

2.对不同来源检测到的不一致之处进行详细分析。

2. A breakdown of inconsistencies detected across sources.

3.置信度评分,用于表示人工智能生成的摘要的可靠性。

3. A confidence score indicating the reliability of the AI-generated summary.

这确保编辑拥有所有必要的背景信息,从而做出明智的决定,而不是从零开始。同样,如果人工智能无法生成连贯的摘要——可能是由于术语过多、人工智能生成的内容污染或写作质量差——系统会将问题转交给人工处理,并提供替代资料来源建议或请求人工干预。

This ensures that the editor has all the necessary context to make an informed decision rather than starting from scratch. Similarly, if the AI fails to generate a coherent summary—perhaps due to excessive jargon, AI-generated content pollution, or poor-quality writing—it escalates to a human with suggested alternative sources or a request for manual intervention.

对于想要进一步探索这些主题的读者,微软的《AI 信任和安全原则》151或 OpenAI 的《API 最佳实践》152等资源提供了关于故障保护、错误处理、断路器和人机升级策略的深入见解。

For readers who want to explore these topics further, resources such as Microsoft’s AI Trust and Safety Principles,151 or OpenAI’s API Best Practices152 provide in-depth insights into fail-safes, error handling, circuit breakers, and human-AI escalation strategies.

N – 导航与规则:人工智能代理如何做出决策

N – Navigation & Rules: How the AI Agent Makes Decisions

现在我们的人工智能代理已经拥有了明确的身份、合适的工具和结构化的工作流程,我们必须解决人工智能设计中最容易被忽视但又至关重要的一个方面:代理如何进行决策决策。人工智能体的性能取决于其决策框架。如果没有明确的决策规则,人工智能体就会变得难以预测、前后矛盾,而且——最危险的是——无法控制。

Now that our AI agent has a clear identity, the right tools, and a structured workflow, we must address one of the most overlooked but critical aspects of AI design: how the agent makes decisions. An AI agent is only as good as its decision-making framework. Without well-defined navigation rules, the agent becomes unpredictable, inconsistent, and—most dangerously—uncontrollable.

把人工智能代理想象成一辆自动驾驶汽车。你不会只告诉一辆自动驾驶汽车“开到目的地”,就指望它自己解决所有问题。它需要规则来导航:

Think of an AI agent as an autonomous vehicle. You wouldn’t just tell a self-driving car, “Drive to the destination” and expect it to figure everything out on its own. It needs rules to navigate:

允许哪些路线?

Which routes are allowed?

它应该如何应对障碍物?

How should it handle obstacles?

如果道路被堵塞了怎么办?

What happens if the road is blocked?

人工智能体需要类似的规则才能可靠地运行。如果没有这些规则,它们就会做出任意选择——或者更糟,做出错误选择。

AI agents need similar rules to function reliably. Without these rules, they make arbitrary choices—or worse, incorrect ones.

定义处理规则:过滤、优先级排序和决策逻辑

Defining Processing Rules: Filtering, Prioritization, and Decision Logic

每个人工智能代理都必须决定哪些信息重要,哪些信息应该忽略,以及如何确定任务的优先级。如果没有规则,它可能会陷入无关数据的泥潭,返回不一致的结果,或者给出糟糕的建议。

Every AI agent must decide what information matters, what to ignore, and how to prioritize tasks. Without rules, it may get bogged down by irrelevant data, return inconsistent results, or make poor recommendations.

一个结构完善的处理系统包括:

A well-structured processing system includes:

1.过滤机制:智能体必须区分相关数据和无关数据。例如,科研人工智能应该优先考虑同行评审的研究论文,而不是未经核实的博客文章。

1. Filtering Mechanisms: The agent must distinguish between relevant and irrelevant data. A research AI, for example, should prioritize peer-reviewed studies over unverified blog posts.

2.优先级逻辑:有些任务比其他任务更紧急或更有价值。客户支持人工智能应该优先处理紧急投诉,然后再处理一般咨询。

2. Prioritization Logic: Some tasks are more urgent or valuable than others. A customer support AI should escalate urgent complaints before handling general inquiries.

3. 处理能力限制:一次性向智能体输入过多数据会降低其运行速度并增加成本。如果人工智能需要检索数千篇文章,则应根据关键参数进行预筛选,而不是分析所有内容。

3. Processing Limits: Overloading an agent with too much data at once slows it down and increases costs. If an AI retrieves thousands of articles, it should pre-filter based on key parameters rather than analyzing everything.

我们开发搜索代理时,最初只关注关键词匹配和时效性。结果呢?我们得到了技术上相关但往往流于表面的内容,无法为读者提供真正的价值。

When we developed our Search Agent, we initially focused only on keyword matching and recency. The result? We got technically relevant but often superficial content that didn’t provide real value to our readers.

通过扩展处理规则,纳入更精细的相关性评分、来源可信度评估和内容多样性要求,我们将代理从一个简单的搜索工具转变为一个精准的内容策展平台。每条规则都各司其职:相关性评分确保内容价值,可信度评估维护质量标准,多样性要求则防止信息茧房效应。

By expanding our processing rules to include sophisticated relevance scoring, source credibility assessment, and content diversity requirements, we transformed the agent from a simple search tool into a discerning content curator. Each rule serves a specific purpose: relevance scoring ensures content value, credibility assessment maintains quality standards, and diversity requirements prevent echo chamber effects.

未能明确这些规则可能导致效率低下、产生偏差,甚至做出错误的决策。要深入了解该主题,您可以参考信息检索方法论<sup> 153 </sup> 、决策树框架<sup> 154</sup>以及机器学习模型中用于完善这些规则的排序算法<sup> 155</sup> 。

Failing to define these rules can lead to inefficiencies, bias, or even incorrect decision-making. To dive deeper into the topic, you can refer to information retrieval methodologies,153 decision tree frameworks,154 and ranking algorithms used in machine learning models to refine these rules.155

确保透明度:为人工智能问责制创建决策路径

Ensuring Transparency: Creating Decision Trails for AI Accountability

人工智能的决策绝不应该像个黑箱——用户需要了解人工智能做出特定反应的原因。如果人工智能提供财务建议或拒绝请求,其决策背后必须有可追溯的推理过程。

AI decisions should never feel like a black box—users need to understand why the agent responded a certain way. If an AI provides a financial recommendation or denies a request, there must be traceable reasoning behind the decision.

实现这一目标的最佳方法之一是实施决策轨迹——记录并解释人工智能思考过程的日志。这些日志应该:

One of the best ways to achieve this is by implementing Decision Trails—logs that record and explain the AI’s thought process. These logs should:

获取输入参数:哪些数据影响了人工智能的决策?

Capture input parameters: What data influenced the AI’s decision?

显示处理步骤:它是如何对选项进行排序、筛选信息或应用逻辑的?

Show processing steps: How did it rank options, filter information, or apply logic?

请给出理由:为什么它选择这个选项而不是另一个选项?

Provide justifications: Why did it choose one option over another?

例如,如果我们的简报研究代理选择了一篇研究论文进行摘要,其日志应包含:

For example, if our newsletter research agent selects a research paper to be summarized, its log should include:

它引用的来源(例如,MIT Tech Review、arXiv、Harvard Business Review)以及文章的撰写日期。

The sources it pulled from (e.g., MIT Tech Review, arXiv, Harvard Business Review) and the date the article was written.

选择该来源的理由,例如与当前新闻通讯主题的相关性以及属于白名单来源。

The reasoning behind its selection, such as relevance to the current newsletter topic and belonging to the whitelisted sources.

它考虑过但最终放弃的其他文章,以及放弃的理由。

Any alternative articles it considered but discarded, along with the rationale.

通过保持决策透明度,摘要代理确保了编辑监督,建立了读者的信任,并允许改进其选择过程——使其从一个简单的自动化工具转变为一个可靠的、负责任的研究伙伴。

By maintaining decision transparency, the summarization agent ensures editorial oversight, builds trust with readers, and allows for refinement in its selection process—transforming it from a simple automation tool into a reliable, accountable research partner.

大多数人工智能代理平台都提供内置的日志记录钩子或中间件,这些钩子或中间件可以进行自定义,以生成决策轨迹,从而提高透明度。这些日志可以通过仪表板访问,用于审查或调试。关键在于将日志记录集成到代理执行任务的工作流程的关键节点。每个代理(例如,搜索、摘要)捕获输入数据(例如,关键词、文章)、采取的操作(例如,摘要方法、相关性评分)和输出(例如,选定的文章、摘要)。

Most AI agent platforms provide built-in hooks or middleware for logging, which can be customized to generate decision trails for transparency. These logs can be accessed via dashboards for review or debugging. The key is to integrate logging at key points in the workflow where agents perform tasks. For each agent (e.g., Search, Summarization), capture input data (e.g., keywords, articles), the actions taken (e.g., summarization method, relevance score), and outputs (e.g., selected articles, summaries).

金融、医疗和政府领域的AI系统已经开始受到透明度方面的监管——这意味着清晰的决策路径很快将成为一项硬性要求,而不仅仅是最佳实践。要了解更多关于日志记录和透明度方面的信息,您可以探索可解释人工智能(XAI)框架、 GDPR等规性法规(例如AI透明度要求)以及企业AI中的审计日志记录实践

AI systems in finance, healthcare, and government are already being regulated for transparency—meaning clear decision trails will soon be a requirement, not just a best practice. To learn more about the topics of logging and transparency, you can explore Explainable AI (XAI) frameworks,156 compliance regulations like GDPR’s AI transparency requirements,157 and audit logging158 practices in enterprise AI.159

T – 测试与信任:如何改进和扩展人工智能代理

T – Testing & Trust: How to Improve and Scale an AI Agent

构建人工智能代理并非一蹴而就,而是一个持续不断的测试、改进和扩展的过程。即使是设计最完善的代理,在实际部署中也会遇到意料之外的行为、性能问题和局限性。如果没有结构化的改进流程,代理可能会产生不可靠的结果,令用户感到沮丧,或者无法有效扩展。正如员工需要接受绩效考核和培训一样,人工智能代理也必须持续进行测试、监控和优化。

Building an AI agent is not a one-time task—it’s an ongoing process of testing, refining, and scaling. Even the most well-designed agent will encounter unexpected behaviors, performance issues, and limitations when deployed in real-world scenarios. Without a structured improvement cycle, the agent may produce unreliable results, frustrate users, or fail to scale effectively. Just like an employee undergoes performance reviews and training, an AI agent must be continuously tested, monitored, and optimized.

模拟真实世界应用场景:确保人工智能在实验室之外也能有效运行

Simulating Real-World Use Cases: Ensuring the AI Works Beyond the Lab

人工智能代理在受控环境下可能表现良好,但现实世界的用户会带来不可预测的输入、极端情况和挑战。确保可靠性和准确性的唯一方法是在部署前,在各种场景下对代理进行严格测试。

An AI agent may perform well in controlled environments, but real-world users bring unpredictable inputs, edge cases, and challenges. The only way to ensure reliability and accuracy is by rigorously testing the agent in diverse scenarios before deployment.

完善的测试方法应包括:

A well-rounded testing approach should include:

常见场景:代理是否按预期处理标准用户请求?

Common Scenarios: Does the agent handle standard user requests as expected?

特殊情况:它如何处理含糊不清、措辞不当或相互矛盾的输入?

Edge Cases: How does it respond to ambiguous, poorly worded, or conflicting inputs?

故障模拟:如果 API 出现故障或用户提供的信息不完整会发生什么情况?

Failure Simulations: What happens if an API is down or if the user provides incomplete information?

例如,客户支持人工智能不仅要测试其处理基本咨询的能力,还要测试其应对客户因情绪激动而提供的模糊、情绪化或误导性信息的能力。同样,人工智能医疗助手也应进行压力测试,以检测和防止其提供不准确的医疗建议。

For example, a customer support AI must be tested not just for basic inquiries, but also for cases where a frustrated customer provides vague, emotional, or misleading input. Similarly, an AI-powered medical assistant should be stress-tested to detect and prevent inaccurate medical advice.

跳过这一步骤可能会导致人工智能出现幻觉、不恰当的反应或错误的决策,尤其是在用户以未预料的方式进行交互时。对于结构化的测试方法,您可以探索 LLM 评估工具,例如 LangChain 的测试套件、 Hugging Face 的160 个人工智能模型基准测试工具、161 个来自 OpenAI 的对抗性测试框架。162

Skipping this step can lead to AI hallucinations, inappropriate responses, or incorrect decision-making, especially when users interact in ways that weren’t originally anticipated. For structured testing methods, you can explore LLM evaluation tools like LangChain’s testing suite,160 AI model benchmarking from Hugging Face,161 and adversarial testing frameworks from OpenAI.162

收集反馈和监控日志:从用户和错误中学习

Collecting Feedback & Monitoring Logs: Learning from Users and Mistakes

人工智能代理上线后,持续监控和收集用户反馈对于提升其性能至关重要。即使工作流程结构完善,人工智能的输出仍然可能出现偏差、产生意料之外的响应,或者无法满足用户预期。

Once an AI agent is live, constant monitoring and user feedback collection are critical to refining its performance. Even with well-structured workflows, AI outputs may still deviate, produce unexpected responses, or fail to meet user expectations.

有效的监测包括:

Effective monitoring includes:

用户反馈机制:允许用户对回复进行评分、标记错误答案并提供上下文反馈。

User Feedback Mechanisms: Allow users to rate responses, flag incorrect answers, and provide contextual feedback.

日志分析:记录输入、输出和决策路径,以检测重复出现的错误或效率低下问题。

Log Analysis: Record inputs, outputs, and decision paths to detect recurring errors or inefficiencies.

行为跟踪:识别用户是否经常放弃交互、请求澄清或遇到困难。

Behavior Tracking: Identify whether users abandon interactions, request clarifications, or get stuck frequently.

例如,如果人工智能招聘助手反复将不合格的候选人排在高位,那么查看其决策日志可以帮助找出评分机制中的偏差或问题。同样,如果电商人工智能推荐不相关的产品,用户反馈可以表明它是否误解了用户的偏好,或者是否过度重视某些趋势。

For instance, if an AI-powered recruiting assistant repeatedly ranks unqualified candidates highly, reviewing its decision logs can help pinpoint biases or issues in scoring mechanisms. Similarly, if an e-commerce AI suggests irrelevant products, user feedback can indicate whether it’s misinterpreting preferences or over-prioritizing certain trends.

忽略这一步骤会导致人工智能性能停滞不前、用户感到沮丧并失去信任。例如,您可以利用 LangSmith、163和 OpenTelemetry 等可观测性工具来跟踪模型行为和用户交互。164

Ignoring this step can result in stagnant AI performance, user frustration, and lost trust. For example, you can leverage observability tools like LangSmith,163 and OpenTelemetry for tracking model behavior and user interactions.164

精益求精:微调人工智能以获得更佳结果

Refining & Improving: Fine-Tuning the AI for Better Results

没有哪个人工智能代理在发布之初是完美的。性能优化是一个迭代过程,涉及根据实际结果调整模型参数、更新提示信息和优化工作流程。

No AI agent is perfect at launch. Performance refinement is an iterative process that involves adjusting model parameters, updating prompts, and optimizing workflows based on real-world results.

需要重点优化的领域包括:

Key areas to optimize include:

温度调整:较低的值会使反应更具确定性,而较高的值会鼓励创造力,但会增加随机性。

Temperature Adjustments: Lower values make responses more deterministic, while higher values encourage creativity but increase randomness.

提示工程:修改指令、约束和系统消息可以显著提高响应的准确性和一致性。

Prompt Engineering: Modifying instructions, constraints, and system messages can drastically improve response accuracy and consistency.

工作流程调整:如果某些任务导致延迟、错误或效率低下,调整数据处理或检索方式可以提高性能。

Workflow Tweaks: If certain tasks are causing delays, errors, or inefficiencies, adjusting how data is processed or retrieved can enhance performance.

例如,如果人工智能法律助手难以给出简洁明了的答案,可能需要及时进行改进以强化其简洁性。同时,如果人工智能经常出现幻觉,则可能需要降低温度设置并采用更严格的知识检索机制。

For instance, an AI-powered legal assistant that struggles with concise answers might need prompt refinements to enforce brevity. Meanwhile, an AI that hallucinates too often might benefit from lower temperature settings and stricter knowledge retrieval mechanisms.

渐进式信任模式

The Progressive Trust Model

人与人工智能体之间的信任并非与生俱来,而是需要逐步建立。我们设计的渐进式信任模型正是为了反映这一现实,确保人工智能系统在展现出可靠性、准确性和透明度的同时,逐步获得自主权。该模型摒弃了人工智能要么完全受控要么完全独立的二元模式,而是分阶段过渡信任,在效率和监督之间取得平衡。

Trust between humans and AI agents isn’t automatic—it must be earned. We have designed the Progressive Trust Model to reflect this reality, ensuring that AI systems gradually gain autonomy as they demonstrate reliability, accuracy, and transparency. Instead of a binary approach where AI is either fully controlled or fully independent, this model transitions trust in stages, balancing efficiency with oversight.

第一阶段:严格监督。在运行的第一个月,人工编辑对人工智能的每一个操作都进行了详细审查。这一密集监督阶段有两个目的:一是确保质量,二是帮助编辑了解人工智能的能力和局限性。

Stage 1: High Oversight. During the first month of operation, human editors reviewed every AI action in detail. This intensive oversight period served two purposes: it ensured quality while helping editors understand the AI’s capabilities and limitations.

第二阶段:选择性审阅。随着系统可靠性的验证,我们转向了更具选择性的审阅流程。编辑们将注意力集中在复杂案例和战略决策上,而日常工作则更多地由人工智能自主处理。

Stage 2: Selective Review. As the system proved its reliability, we shifted to a more selective review process. Editors focused their attention on complex cases and strategic decisions, while routine tasks were handled more autonomously by the AI.

第三阶段:战略监督。在目前的运营中,人工干预主要集中在战略方向和特殊情况的处理上。人工智能高度自主地处理日常运营,但始终在明确设定的参数范围内进行。

Stage 3: Strategic Oversight. In our current operation, human involvement focuses primarily on strategic direction and exceptional cases. The AI handles routine operations with high autonomy, but always within clearly defined parameters.

这种渐进式模型使组织能够在保持适当安全措施的同时,增强对人工智能代理的信心。其核心洞见在于,人机协作并非非此即彼的选择,而是一个根据实际表现不断演进的连续谱。对于每个代理部署,我们现在都绘制出从高度监管到战略监管的演进路径,并明确了各阶段之间的转换标准。

This progressive model allows organizations to build confidence in their AI agents while maintaining appropriate safeguards. The key insight is that human-AI collaboration isn’t a binary choice but a spectrum that evolves based on demonstrated performance. For every agent implementation, we now map out this progression from high oversight to strategic oversight, with clear criteria for advancing between stages.

正如一位客户所说:“一开始就采取严格的监管措施,并非是对技术缺乏信任,而是为了给团队时间进行调整,同时确保业务连续性。”这种方法显著提高了我们代理部署的采用率和长期成功率。

As one client noted, “Starting with high oversight wasn’t about lack of trust in the technology—it was about giving our team time to adapt while ensuring business continuity.” This approach has significantly improved adoption rates and long-term success of our agent implementations.

扩展规划:确保代理能够应对增长

Planning for Scaling: Ensuring the Agent Can Handle Growth

一个设计良好的AI代理应该能够随着需求增长而扩展——处理更多用户、更大数据集和更复杂的任务。且性能不下降。许多人工智能项目失败并非因为它们本身不好用,而是因为它们无法高效扩展。

A well-designed AI agent should be able to grow with demand—handling more users, larger datasets, and increased complexity without degradation in performance. Many AI projects fail not because they don’t work, but because they can’t scale efficiently.

可扩展性规划包括:

Scalability planning involves:

负载测试:代理能否在不降低速度的情况下处理 10 倍以上的用户数量?

Load Testing: Can the agent handle 10x more users without slowing down?

并行处理:它能否将任务分配到多个服务器或模型上?

Parallel Processing: Can it distribute tasks across multiple servers or models?

成本优化:基础设施是否具有规模化的成本效益,还是性能的代价是不可持续的?

Cost Optimization: Is the infrastructure cost-effective at scale, or does performance come at an unsustainable price?

例如,一个每天处理 1000 个查询的 AI 客服机器人可能运行良好,但如果扩展到每天 10 万次交互,它的速度可能会变慢,或者运营成本过高。确保它能够有效地分配工作负载——通过无服务器架构、缓存策略和多代理协调——是其长期生存的关键。

For example, an AI customer support bot handling 1,000 queries per day might perform well, but if it scales to 100,000 daily interactions, it could slow down or become too expensive to operate. Ensuring it can distribute workload effectively—through serverless architectures, caching strategies, and multi-agent coordination—is key to long-term viability.

对人工智能扩展策略感兴趣的人士可以了解 AWS Auto Scaling、165 条Google Cloud AI 基础设施最佳实践、166以及 NVIDIA 用于处理大规模推理工作负载的 AI 部署框架。167

Those interested in AI scaling strategies can explore AWS Auto Scaling,165 Google Cloud AI infrastructure best practices,166 and NVIDIA’s AI deployment frameworks for handling large-scale inference workloads.167

简约的力量

The Power of Simplicity

我们从可靠性方面学到的一个关键教训是简洁的重要性。我们早期版本的邮件推送代理系统包含了复杂的恢复程序和精心设计的备用方案。机制。随着时间的推移,我们发现更简单、经过充分测试的流程往往比复杂的流程更可靠。

One crucial lesson we learned about reliability was the value of simplicity. Early versions of our newsletter agentic system included complex recovery procedures and elaborate fallback mechanisms. Over time, we found that simpler, well-tested processes were often more reliable than sophisticated ones.

拉凯什常说:“最可靠的系统,就是那些故障点越少的系统。”这一原则指导我们简化流程,尽可能消除不必要的复杂性。

“The most reliable systems,” Rakesh often says, “are those that have fewer things that can go wrong.” This principle guided us to streamline our processes and eliminate unnecessary complexity wherever possible.

使用LLM聊天机器人构建全面的规范

Using LLM Chatbots to Build Comprehensive Specifications

我们发现,像 ChatGPT、Gemini 或 Claude 这样的 AI 聊天机器人对于开发这些详细的定义和参数非常有价值,无论是详细定义代理的身份、定义正确的 API 参数,还是设计回退策略。然而,关键在于如何有效地引导它们。我们喜欢使用一种称为“渐进式定义细化”的方法:

We’ve found that AI Chatbots like ChatGPT, Gemini, or Claude are invaluable in developing these detailed definitions and parameters, whether it is to define in detail the identity of an agent, define the right API parameters, or design fallback strategies. However, the key is knowing how to prompt them effectively. We like to use a method we call “Progressive Definition Refinement”:

首先,我们要求LLM生成一个基本的角色描述。然后,我们逐步探究潜在问题、极端情况和故障模式。例如:

First, we ask the LLM to generate a basic role description. Then, we progressively probe for potential issues, edge cases, and failure modes. For example:

初始提示:“内容摘要代理在处理新闻文章时应考虑哪些因素?”

Initial prompt: “What should a content summarization agent consider when processing news articles?”

后续提示:“上述每项考虑都可能出现什么问题?”“代理应该如何处理每种类型的故障?”“哪些指标可以表明代理运行良好?”

Follow-up prompts: “What could go wrong with each of these considerations?” “How should the agent handle each type of failure?” “What metrics would indicate the agent is performing well?”

每个回复都帮助我们构建更全面的定义,然后我们再根据现实世界的场景对其进行验证。

Each response helps us build a more comprehensive definition, which we then validate against real-world scenarios.

代理框架概述

Summary of the Agent Framework

AGENT 框架提供了一种结构化的方法,用于构建可靠、可扩展且有效的 AI 代理。下面是一个总结关键组成部分的矩阵,便于实施和应用。

The A.G.E.N.T. framework provides a structured methodology for building AI agents that are reliable, scalable, and effective. Below is a matrix summarizing the key components, making it easy to implement and apply.

图像

表 8.5:代理框架概述(来源:© Bornet 等人)

Table 8.5: Summary of the Agent Framework (Source: © Bornet et al.)

我们的通讯代理项目成果

The Outcome of our Newsletter Agentic Project

代理系统的引入彻底改变了新闻简报的制作流程,将每周的工作量从超过 10 小时大幅削减至不足 2 小时——耗时减少了惊人的 80%。过去需要耗费大量人力才能完成的任务,例如搜索文章、撰写摘要和排版新闻简报,现在都实现了无缝自动化。这使得人工审核人员能够专注于高价值的决策,例如选择最佳文章和最终确定发布内容。

The introduction of the agent system has completely transformed the newsletter creation process, slashing the workload from over 10 hours per week to less than 2 hours—a remarkable 80% reduction in time spent. Tasks that once demanded painstaking manual effort, like searching for articles, summarizing them, and formatting the newsletter, are now seamlessly automated. This allows the human reviewer to focus exclusively on high-value decisions, such as selecting the best articles and finalizing the content for publication.

除了效率提升之外,该系统在质量方面也实现了显著飞跃。过去,语气不一致和格式错误屡见不鲜,而现在,每次输出的内容都精雕细琢、专业水准极高。客服人员确保了内容的一致性,使其能够引起受众的共鸣,同时可靠性也大幅提升。摘要和简报每次都能按时送达,彻底解决了人工流程中常见的延误问题。

Beyond efficiency, the system has delivered a noticeable leap in quality. Previously, inconsistencies in tone and errors in formatting were common, but now, the output is polished and professional every time. The agents ensure a cohesive voice that resonates with the audience while reliability has skyrocketed. Summaries and newsletters are delivered on time, every time, without the delays that plagued the manual process.

或许最令人印象深刻的优势在于其可扩展性。该系统能够轻松应对增加的工作量——如有需要,可处理多 50% 的文章——而无需额外的人工干预。这种节省时间、提升质量和增强适应性的结合,不仅优化了工作流程,更树立了内容创作效率和卓越性的新标杆。

Perhaps the most impressive benefit is scalability. The system can easily handle an increased workload—processing 50% more articles if needed—without requiring additional human effort. This combination of time savings, enhanced quality, and adaptability has not just optimized the workflow but set a new benchmark for efficiency and excellence in content creation.

我们的邮件订阅系统在一个月内就发展到 30 万订阅用户,其成功不仅仅在于拥有先进的人工智能,更在于我们精心设计的 AI 代理在一个协调有序的系统中协同工作。每个代理都清楚地知道该做什么、如何做,以及出现问题时该如何处理。

The success of our newsletter system, growing to 300,000 subscribers in just a month, wasn’t just about having sophisticated AI—it was about having meticulously defined AI agents working together in a well-orchestrated system. Each agent knew exactly what to do, how to do it, and what to do when things went wrong.

您可以尝试订阅我们的“智能体智能”新闻简报,随时了解这一激动人心话题的最新动态。订阅地址:

You can try it—subscribe to our “Agentic Intelligence” newsletter to stay updated with the lastest news on this exciting topic. Find it on:

Substack 的网址是https://ageniticintelligence.substack.com

Substack at https://agenticintelligence.substack.com

LinkedIn:请点击此链接:https://www.linkedin.com/newsletters/agentic-intelligence-7293015480007557121

LinkedIn: following this link: https://www.linkedin.com/newsletters/agentic-intelligence-7293015480007557121

成功实施人工智能代理的 20 条最佳建议

Our Top 20 Implementation Tips for Successful AI Agents

为了总结本章的主要内容,我们整理了以下几点重要提示,以指导您从概念到成功部署的整个过程:

To summarize the key learning from this chapter, we’ve compiled these essential tips to guide your journey from concept to successful deployment:

第一步:寻找合适的代理机会

Step 1: Finding the Right Agentic Opportunities

1.找到你的最佳切入点:找到三个关键因素交汇的机会——对你的事业有重大影响、在现有技术下可行、实施工作量合理。

1. Find Your Sweet Spot: Identify opportunities where three key factors intersect—high impact on your business, feasibility with current technology, and reasonable implementation effort.

2.认识到代理人的固有局限性:需要真正人类创造力、战略判断或情商的任务应该由人类来完成——并非所有流程都应该自动化。

2. Recognize Agents’ Inherent Limitations: Keep tasks requiring genuine human creativity, strategic judgment, or emotional intelligence with humans—not every process should be automated.

3.关注任务而非角色:请记住,智能体并非员工——凭借他们目前的能力,他们擅长特定任务,而非承担广泛的角色。一名员工可能管理五个流程;而要实现相同工作的自动化,您可能需要五个智能体。

3. Think Tasks, Not Roles: Remember that agents aren’t employees—with their current capabilities, they excel at specific tasks, not broad roles. One employee might manage five processes; you might need five agents to automate the same work.

4.从流程文档入手:人工智能代理的最佳基础是清晰的流程文档。现有的流程文档通常能提供理想的训练材料:具体步骤、工具、决策树和示例案例。

4. Start With Documented Processes: The best foundation for an AI agent is a clearly documented process. Existing process documentation often provides the ideal training material: specific steps, tools, decision trees, and example cases.

5.只自动化已验证有效的流程:永远不要自动化从未手动执行过的流程。首先,手动验证该流程是否有效,然后再将其自动化。

5. Only Automate Proven Processes: Never automate a process that has never been performed manually. First, prove manually that the process works, and then automate it.

6.将复杂问题分解:采用分而治之的方法,一次只处理一个组件,而不是一次性构建整个系统。

6. Break Complex Problems Down: Use a divide-and-conquer approach, tackling one component at a time rather than building an entire system at once.

步骤二:定义人工智能代理的角色和能力

Step 2: Defining AI Agents’ Role and Capabilities

1.明确代理人的目标和指令至关重要:务必花时间为每位代理人制定清晰的目标、角色和职责范围。记住,举例胜过千言万语,并将最重要的指令放在提示的最后。

1. Defining Agent Goals and Instructions in Detail Is Crucial: Invest time in crafting precise purpose, role, and scope for each agent. Remember that examples are worth a thousand words, and place the most important instructions at the end of your prompts.

2.越简单越好:代理、工具或任务越多,复杂性、成本和维护难度就越大。从最小规模开始,逐步扩展。

2. The Simpler, The Better: More agents, more tools, or more tasks create more complexity, costs, and maintenance challenges. Start minimal and expand gradually.

3.单一工具,单一代理:大多数情况下,应将每个代理限制为一个定义明确的单一工具,而不是试图构建复杂的多用途代理。简洁性带来可靠性。

3. One Tool, One Agent: In most cases, limit each agent to a single, well-defined tool rather than trying to build complex multi-purpose agents. Simplicity leads to reliability.

步骤三:设计成功的AI代理

Step 3: Designing AI Agents for Success

1.以人机协作为导向的设计:构建能够增强人类能力而非完全取代人类能力的智能体。让人类参与到质量保证和战略决策中来。

1. Design for Human Collaboration: Build agents that augment human capabilities rather than trying to replace them entirely. Keep humans in the loop for quality assurance and strategic decisions.

2.整合到用户现有工作环境中:确保代理程序能够在现有系统中运行。如果用户觉得访问不便,即使是最好的代理程序也毫无价值。

2. Integrate Where Users Already Work: Ensure agents operate within existing systems. The best agent is worthless if users find it inconvenient to access.

3. 启用客服反馈:为客服人员提供分析其操作结果的工具。他们应该能够验证自己的任务是否成功完成。

3. Enable Feedback to Agents: Give agents tools to analyze the results of their actions. They should be able to verify whether their tasks were completed successfully.

4.规范输入输出:严格定义所有输入输出的格式,以防止因数据结构不匹配而导致的错误。

4. Standardize Inputs and Outputs: Strictly define the format of all inputs and outputs to prevent errors caused by mismatched data structures.

5.将流程数据与操作分离:确保代理的已知信息和可执行操作之间清晰分离。这有助于提高安全性和可维护性。

5. Separate Process Data from Actions: Ensure clean separation between what the agent knows and what it can do. This improves both security and maintainability.

第四步:实施您的人工智能代理

Step 4: Implementing Your AI Agents

1.速度比完美更重要:不要陷入寻找完美平台的泥潭。从可行的方案入手,从实施过程中学习,并不断迭代改进。

1. Prioritize Speed Over Perfection: Don’t get stuck searching for the perfect platform. Start with something workable, learn from implementation, and improve iteratively.

2.容错设计:构建强大的错误处理机制、熔断器、优雅降级机制和人工升级路径。代理程序会发生故障——它们的恢复方式至关重要。

2. Design for Failure: Build in robust error handling, circuit breakers, graceful degradation, and human escalation paths. Agents will fail—how they recover matters most.

3.建立决策轨迹:确保代理人记录其每个决策的推理过程,从而建立问责制并实现有针对性的改进。

3. Build Decision Trails: Ensure agents log their reasoning process for every decision, creating accountability and enabling targeted improvements.

4.持续收集反馈:实施收集用户意见和系统性能指标的机制,以推动持续改进。

4. Collect Continuous Feedback: Implement mechanisms to gather user input and system performance metrics to drive ongoing improvements.

5.采用渐进式信任模型:实施分阶段监督,随着代理人被证明可靠,逐步减少人为干预。

5. Use Progressive Trust Models: Implement staged oversight that gradually reduces human involvement as the agent proves reliable.

6. 使用真实场景进行测试:在部署之前,针对极端情况和意外输入进行严格测试。

6. Test with Real-World Scenarios: Rigorously test against edge cases and unexpected inputs before deployment.

7.接受迭代的必然性:没有哪个代理程序能在第一次尝试时就完美运行。在实施计划中,应预留多个改进周期。

7. Accept Iteration as Inevitable: No agent works perfectly on the first try. Plan for multiple refinement cycles as part of your implementation timeline.

8.部署代理比构建代理要困难得多:集成方面的挑战往往比开发本身更加复杂。至少要为部署投入与初始开发同等的时间和资源。

8. Deploying Agents Is a Lot Harder Than Building Them: Integration challenges often exceed development complexity. Allocate at least as much time and resources to deployment as to initial development.

9.从小规模开始,然后逐步扩大:从能够提供价值、证明其价值的最小组件开始,然后系统地扩展。

9. Start Small, Then Scale: Begin with the smallest component that can deliver value, prove its worth, and then expand systematically.

这些建议是我们多年来在各种规模的组织中实施人工智能代理的过程中总结出的宝贵经验。尽管这项技术仍在快速发展,但这些原则始终是成功实施与失败实施之间的分水岭。通过关注这些基本要素,您可以避免许多代理项目失败的常见陷阱。

These tips represent hard-won insights from our work implementing AI agents across organizations of all sizes. While the technology continues to evolve rapidly, these principles have consistently separated successful implementations from failures. By focusing on these fundamentals, you’ll avoid the common pitfalls that have derailed many agent projects.

第九章

CHAPTER 9

从创意到收入:代理经济的商业模式

FROM IDEAS TO INCOME: BUSINESS MODELS FOR THE AGENT ECONOMY

自主运营企业的诞生:当人工智能成为企业家

The Birth of Self-Running Businesses: When AI Became an Entrepreneur

在穆斯塔法·苏莱曼的开创性著作《即将到来的浪潮》中,他提出了一个引人入胜的图灵测试新版本。<sup>168</sup>他没有问机器能否在对话中欺骗人类,而是提出了一个更实际的挑战:人工智能能否“仅用10万美元的投资,在几个月内通过零售网络平台赚到100万美元”?在一个秋高气爽的早晨,我们聚集在一起进行实验,当时我们并不知道,我们即将迈出通过这项现代图灵测试的第一步。

In his groundbreaking book “The Coming Wave,” Mustafa Suleyman proposed a fascinating new version of the Turing test.168 Instead of asking whether a machine could fool a human in conversation, he suggested a more practical challenge: could an AI “go make $1 million on a retail web platform in a few months with just a $100,000 investment”? As we gathered that crisp autumn morning to conduct our experiment, we didn’t know we were about to take the first steps toward passing this modern Turing test.

那是2024年10月22日。当全世界还在惊叹人工智能创作诗歌和生成营销文案的能力时,我们我们将要见证更具变革性的事物:能够像企业家一样思考和行动的人工智能。

It was October 22, 2024. While the world was still marveling at AI’s ability to write poetry and generate marketing copy, we were about to see something far more transformative: artificial intelligence that could think and act like an entrepreneur.

“咱们来试试疯狂的办法吧,”我向研究团队提议道,我们当时正围坐在实验室的显示器旁。多年来,我们一直在实施传统的自动化解决方案——那种按照预先设定的路径在计算机系统中运行的解决方案——现在我们准备突破界限。“与其告诉人工智能该做什么,不如给它设定一个业务目标,看看会发生什么。”

“Let’s try something crazy,” I suggested to our research team, as we huddled around our monitors in the lab. After years of implementing traditional automation solutions—the kind that follow predetermined paths through computer systems—we were ready to push the boundaries. “Instead of telling AI what to do, let’s give it a business goal and see what happens.”

我们决定使用克劳德的计算机使用功能,这是一个强大的工具,可以让人工智能直接与计算机系统交互。我们设定的挑战看似简单:人工智能代理能否在无人干预的情况下赚到 1 万美元?没有预先编写的脚本,无法访问现有的商业账户,只需要一个网络浏览器和编写代码的能力。

We decided to use Claude’s Computer Use capability, a powerful tool that allows AI to interact directly with computer systems. The challenge we set was deceptively simple: could the AI agent figure out how to make $10,000 without human intervention? No pre-written scripts, no access to existing business accounts, just a web browser and the ability to write code.

实验开始:既兴奋又忐忑

The Experiment Begins: Excitement and Trepidation

实验开始时,房间里的气氛紧张得令人窒息。汤姆紧张地用手指敲着桌子——他在这个领域工作了几十年,见过太多人工智能实验失败的例子。布莱恩则不停地检查我们的安全规程。我们设定了严格的限制:初始预算只有20美元,而且不能访问任何需要登录才能使用的服务。这些限制会不会过于局限?或者它们会迫使人工智能进行更具创造性的思考?

The tension in the room was palpable as we initiated the experiment. Tom nervously drummed his fingers on the desk—after decades in the field, he’d seen plenty of AI experiments go sideways. Brian kept checking and rechecking our safety protocols. We’d set up strict boundaries: a $20 initial budget and no access to login-restricted services. Would these constraints prove too limiting? Or would they force the AI to think more creatively?

坦白说,最初的三十分钟简直令人胆战心惊。人工智能的运行速度慢得惊人,它打开多个浏览器标签页、编写代码片段、访问开发工具的速度都远超我们的反应。“我们该阻止它吗?”约亨一度低声问道,因为这个智能体开始快速部署我们始料未及的网络服务。但好奇心最终战胜了谨慎,我们还是让它继续运行下去。

The first thirty minutes were frankly terrifying. The AI moved at an incredibly slow pace, opening multiple browser tabs, writing code snippets, and accessing development tools faster than we could follow. “Should we stop it?” Jochen whispered at one point as the agent began rapidly deploying web services we hadn’t anticipated. But curiosity won over caution, and we let it continue.

从混沌到创造:观察人工智能如何像企业家一样思考

From Chaos to Creation: Watching AI Think Like an Entrepreneur

接下来的几个小时里发生的事情,彻底颠覆了我们对人工智能和商业自动化的认知。人工智能代理在接到任务后不到一个小时,就不仅进行了头脑风暴,还构建了一个完整的商业模式。想想人类企业家通常是如何工作的:他们发现问题,设计解决方案,然后思考如何将其商业化。我们的人工智能代理正是这样做的,但速度之快令我们瞠目结舌。

What unfolded over the next few hours would challenge everything we thought we knew about artificial intelligence and business automation. Within an hour of receiving its mission, the AI agent wasn’t just brainstorming—it was building a complete business model. Think about how human entrepreneurs typically work: they identify a problem, devise a solution, and figure out how to monetize it. This is exactly what our AI agent did, but at a speed that left us speechless.

这位经纪人发现了一个绝佳的市场机会:餐厅在向电子菜单转型过程中举步维艰。他的解决方案是什么?一套远超简单电子菜单的先进二维码菜单系统。这不仅仅是一个技术方案,更是一个精心打造的商业方案,充分考虑了实际需求和限制。

The agent identified a perfect market opportunity: restaurants struggling with the transition to digital menus. Its solution? A sophisticated QR code menu system that went far beyond simple digital menus. This wasn’t just a technical solution; it was a thoughtfully crafted business offering that considered real-world needs and constraints.

一切改变的那一刻

The Moment Everything Changed

最惊险的时刻发生在实验进行到大约三个小时的时候。智能体刚刚完成第一个原型创建,突然停止了所有操作。接下来的十七分钟令人提心吊胆,什么也没发生。后来我们才意识到,它当时正在运行一项全面的市场分析,而我们并没有明确要求它这样做。

The most dramatic moment came about three hours into the experiment. The agent had just finished creating its first prototype when it suddenly paused all operations. For seventeen nerve-wracking minutes, nothing happened. We later realized it was running a comprehensive market analysis, something we hadn’t explicitly asked it to do.

当业务恢复时,其商业模式已经发生了翻天覆地的变化。“你看,”雷指着椅子,身体前倾,“这不仅仅是卖二维码——而是在构建一个完整的餐厅分析生态系统。”这位经纪人将最初简单的电子菜单系统改造成了一个复杂的商业智能平台。

When it resumed, the business model had evolved dramatically. “Look at this,” Ray pointed out, leaning forward in his chair. “It’s not just selling QR codes—it’s building an entire restaurant analytics ecosystem.” The agent had transformed what started as a simple digital menu system into a sophisticated business intelligence platform.

最让我们着迷的是观察代理的推理过程。当面对人们对纯电子菜单信任度的潜在质疑时,它并没有固守原计划——它该解决方案不断改进。代理程序增加了追踪用餐高峰时段和分析顾客偏好的功能,将一个简单的菜单系统转变为一个商业智能工具,可以帮助餐厅老板优化运营。

What fascinated us most was watching the agent’s reasoning process. When faced with potential objections about trust in digital-only menus, it didn’t just stick to its original plan—it evolved the solution. The agent added features for tracking peak dining hours and analyzing customer preferences, transforming a simple menu system into a business intelligence tool that could help restaurant owners optimize their operations.

三级智能体的崛起

The Rise of Level 3 Agentic AI

要理解我们所见证的意义,首先需要了解它在人工智能智能体发展历程中的位置。此次演示代表了我们所说的第三级人工智能智能体——这类系统能够理解复杂的指令,进行精细的推理,并协调多种工具来实现目标。与过去僵化的、基于规则的自动化系统(第一级)甚至更为灵活的智能自动化系统(第二级)不同,该智能体展现出了真正的解决问题的能力。

To understand the significance of what we witnessed, it’s important to understand where this fits in the evolution of AI agents. This demonstration represented what we call Level 3 AI agent—systems that can understand complex instructions, reason sophisticatedly, and orchestrate multiple tools to achieve goals. Unlike the rigid, rule-based automation systems of the past (Level 1) or even the more flexible intelligent automation systems (Level 2), this agent showed genuine problem-solving capabilities.

不妨这样理解:如果传统自动化就像教机器人照着菜谱做菜,那么我们所看到的更像是观看一位厨师根据现有食材和顾客喜好创作新菜式。这个智能体并非只是执行预先设定的指令,而是根据市场需求和潜在挑战,不断创造、调整和完善自身的方法。

Think of it this way: if traditional automation is like teaching a robot to follow a recipe, what we witnessed was more like watching a chef create new dishes based on available ingredients and customer preferences. The agent didn’t just execute pre-programmed instructions—it created, adapted, and refined its approach based on market needs and potential challenges.

商业模式初具雏形

The Business Model Takes Shape

最令人印象深刻的不仅是技术解决方案,还有代理商如何制定一套完整的商业策略。他们开发了双层定价模式:基础服务99美元,高级功能200美元。这并非随意定价——代理商分析了市场,考虑了价值主张,并构建了相应的服务体系,以吸引那些缺乏技术专长但又需要数字化解决方案的中小型餐厅。

The most impressive aspect wasn’t just the technical solution, but how the agent crafted a complete business strategy. It developed a two-tier pricing model: $99 for basic service and $200 for premium features. This wasn’t arbitrary pricing—the agent had analyzed the market, considered the value proposition, and structured its offering to appeal to small and medium-sized restaurants that lacked technical expertise but needed digital solutions.

然而,事情并非一帆风顺。大约五个小时后,我们发现代理商开发的支付处理系统存在严重缺陷,它忽略了关键的安全措施。为了尽快将产品推向市场,它忽略了一些关键细节。这凸显了当前人工智能系统的一个主要局限性——尽管它们行动速度极快,但有时会忽略人类企业家本能会考虑到的关键细节。

However, not everything went smoothly. Around hour five, we discovered a serious flaw in the payment processing system the agent had created. It had overlooked crucial security protocols in its rush to get to market. This highlighted one of the key limitations of current AI systems—while they can move incredibly fast, they sometimes miss critical details that human entrepreneurs would instinctively consider.

超越实验:这对未来意味着什么

Beyond the Experiment: What This Means for the Future

这项实验揭示了商业和自动化未来发展的深刻趋势。我们正在超越人工智能作为工具的时代,迈入人工智能作为自主商业创造者的时代。虽然我们的实验侧重于一个相对简单的商业模式,但它展示了人工智能代理识别机遇、创建解决方案并根据现实世界的限制调整这些解决方案的潜力——所有这些都无需人工干预。

This experiment revealed something profound about the future of business and automation. We’re moving beyond the era of AI as a tool and into the age of AI as an autonomous business creator. While our experiment focused on a relatively simple business model, it demonstrated the potential for AI agents to identify opportunities, create solutions, and adapt those solutions based on real-world constraints—all without human intervention.

然而,值得注意的是,当前的技术发展阶段尚不明确。虽然我们的智能体在商业构思和技术实现方面展现出了卓越的能力,但在智能体人工智能发展框架中,我们仍处于第三级。这意味着,尽管该智能体能够协调复杂的工作流程并做出精密的决策,但它仍然缺乏真正的自适应学习能力和完全的自主性。

However, it’s important to note where current technology stands. While our agent showed remarkable capabilities in business ideation and technical implementation, we’re still at Level 3 in the Agentic AI Progression Framework. This means that while the agent can orchestrate complex workflows and make sophisticated decisions, it still lacks true adaptive learning capabilities and complete autonomy.

前路漫漫

The Road Ahead

经过八个小时紧张的实验,我们终于结束了实验。我们意识到,我们见证了接近苏莱曼现代图灵测试的结果。虽然我们没能达到他提出的百万美元目标,但我们已经证明人工智能可以独立构思并推出可行的商业模式。其意义令人震惊。

As we wrapped up our experiment after eight intense hours, we realized we’d witnessed something approaching Suleyman’s modern Turing test. While we hadn’t reached the million-dollar mark he proposed, we’d demonstrated that AI could independently conceive and launch a viable business model. The implications were staggering.

随着这项技术的不断发展,我们可以想象,未来人工智能不仅能为企业提供支持,还能创建和运营企业。这并非要取代人类企业家;而是……关于为企业创建和运营中人机协作创造新的可能性。

As we watch this technology evolve, we can imagine a future where AI agents don’t just support businesses—they create and run them. This isn’t about replacing human entrepreneurs; it’s about creating new possibilities for human-AI collaboration in business creation and operation.

想想这意味着什么:企业可以全天候运转,持续优化运营并适应市场变化。企业家可以同时启动多个项目,由人工智能代理处理日常运营,而人类则专注于战略和创新。

Think about the implications: businesses that could operate 24/7, continuously optimizing their operations and adapting to market changes. Entrepreneurs could launch multiple ventures simultaneously, with AI agents handling the day-to-day operations while humans focus on strategy and innovation.

从实验中学习

Learning from the Experiment

这项实验让我们对人工智能代理在商业领域的未来发展有了几个至关重要的认识:

This experiment taught us several crucial lessons about the future of AI agents in business:

1.自主人工智能不仅可以执行任务,还能进行战略思考。

1. Autonomous AI can think strategically, not just execute tasks

2.人工智能代理可以根据现实世界的限制和反馈来调整其解决方案。

2. AI agents can adapt their solutions based on real-world constraints and feedback

3.商业自动化的未来不仅仅关乎效率,更关乎创造和创新。

3. The future of business automation isn’t just about efficiency—it’s about creation and innovation

人工智能学会创建企业的那一天,不仅仅是人工智能发展史上的又一个里程碑;它预示着未来商业领域中人类与人工智能的界限将日益模糊。尽管我们仍处于这场变革的早期阶段,但有一点是毋庸置疑的:未来创业的格局将取决于我们与这些能力日益强大的人工智能体合作的能力。

The day AI learned to create a business wasn’t just another milestone in artificial intelligence; it was a glimpse into a future where the line between human and artificial intelligence in business becomes increasingly intertwined. While we’re still in the early stages of this revolution, one thing is clear: the future of entrepreneurship will be shaped by our ability to collaborate with these increasingly capable AI agents.

问题不再是人工智能能否运营企业,而是如何才能最好地利用这项能力,为人类的创造力和创新创造新的机遇。欢迎来到人工智能创业者的时代——在这个时代,企业真正可以实现自主运营,即使你睡着了也能运转。

The question is no longer whether AI can run a business, but how we can best harness this capability to create new opportunities for human creativity and innovation. Welcome to the age of the AI entrepreneur—where businesses can truly run themselves while you sleep.

智能体人工智能时代的新兴商业模式

Emerging Business Models in the Age of Agentic AI

智能体人工智能的兴起不仅创造了新的工具,更重塑了企业运营的根本结构。智能体人工智能的到来正在改写商业规则,为前所未有的模式、机遇和趋势创造了沃土。这种转变不仅是技术层面的,更是经济、文化和人文层面的——这种融合有望以我们目前尚不完全了解的方式重塑各行各业。企业现在必须转变思路,不再仅仅将人工智能集成到自身系统中,而是将其定位为驱动创新、决策和价值创造的副驾驶。

The rise of agentic AI isn’t just creating new tools—it’s reshaping the very fabric of how businesses operate. The dawn of agentic AI is rewriting the rules of business, creating fertile ground for unprecedented models, opportunities, and trends. This shift is not just technological but also economic, cultural, and deeply human—a fusion that promises to reshape industries in ways we are only beginning to grasp. Businesses must now pivot from merely integrating AI to positioning it as a co-pilot that drives innovation, decision-making, and value creation.

代理即服务:交付成果

Agent-as-a-Service: Delivering Outcomes

人工智能领域正在涌现一种强大的新模式:代理即服务 (Agent-as-a-Service)。传统的软件即服务 (SaaS) 平台主要提供工具和算法,需要用户手动管理和集成。然而,代理型人工智能引入了自主性,将重点从提供功能转移到交付成果。订阅这些服务的企业不再仅仅是购买一套工具,而是购买结果。

A powerful new model is emerging in AI: Agent-as-a-Service. Traditional Software-as-a-Service (SaaS) platforms have primarily offered tools and algorithms that require users to manage and integrate them manually. However, agentic AI introduces autonomy, shifting the focus from providing capabilities to delivering outcomes. Businesses subscribing to these services are no longer just buying a set of tools—they are purchasing results.

为了更清晰地阐释这个概念,不妨比较一下传统SaaS和代理即服务(Agent-as-a-Service)之间的区别。在传统的SaaS模式下,企业订阅营销自动化平台,但员工仍然需要配置电子邮件营销活动、分析绩效指标并手动调整目标受众。而采用代理即服务模式,企业不仅为软件付费,还为能够自动设计、个性化和优化营销活动的自主AI代理付费,用户只需提供一些基础性的输入即可。

To make this concept clearer, consider the difference between traditional SaaS and Agent-as-a-Service. In the traditional SaaS model, a company subscribes to a marketing automation platform, but employees must still configure email campaigns, analyze performance metrics, and manually adjust targeting. With an Agent-as-a-Service model, the business is not just paying for software but for an autonomous AI agent that designs, personalizes, and optimizes marketing campaigns automatically, requiring only high-level input from the user.

这种从人工软件操作到人工智能驱动执行的转变已经在各行各业得到应用。在房地产行业,公司不再需要使用多种软件工具来发布房源信息、回复咨询和安排看房。相反,他们可以付费使用人工智能代理,实现从撰写房产描述到安排客户会议的全流程自动化。就连客户支持也在发生变革,以前依赖人工客服的小企业现在可以付费使用人工智能驱动的代理,这些代理可以处理咨询、处理订单,并在必要时上报复杂问题。

This shift from manual software operation to AI-driven execution is already being applied across industries. In real estate, firms no longer need to use multiple software tools to list properties, respond to inquiries, and schedule viewings. Instead, they can pay for an AI agent that automates the entire process, from writing property descriptions to booking client meetings. Even customer support is being transformed, as small businesses that previously relied on human representatives can now pay for AI-driven agents that handle inquiries, process orders, and escalate complex issues when necessary.

代理即服务 (Agent-as-a-Service) 类似于在 Fiverr 或 Upwork 上雇佣人类专家,但不同之处在于,您不是付费给自由职业者完成任务,而是付费给人工智能代理,让它自主交付成果。例如,在 Fiverr 上,您可以雇佣人类专家为您的新闻简报研究和总结新闻文章、为您的电商网站撰写产品描述或管理您的社交媒体帖子。

Agent-as-a-Service is similar to hiring a human expert on Fiverr or Upwork, but instead of paying a freelancer to complete a task, you’re paying an AI agent to deliver the outcome autonomously. For example, on Fiverr, you might hire a human expert to research and summarize news articles for your newsletter, write product descriptions for your e-commerce store, or manage your social media posts.

借助“代理即服务”,您实际上是在雇佣一位人工智能代理来完成同样的工作,而且是即时、按需的,通常成本也更低。您无需自行管理工具,只需按次付费,人工智能即可生成最终产品——无论是简报、房源列表、市场报告还是自动化营销活动。

With Agent-as-a-Service, you’re essentially hiring an AI-powered agent to do the same, but instantly, on-demand, and often at a lower cost. Instead of managing tools yourself, you simply pay per use for the AI to generate a finished product—whether it’s a newsletter, a real estate listing, a market report, or an automated marketing campaign.

关键区别在于:人工智能代理不会取代专业知识,而是将其规模化。企业和个人无需等待自由职业者交付工作成果,即可随时随地以远低于人工成本的价格,即时获得人工智能驱动的专业知识。

The key difference? AI agents don’t replace expertise; they scale it. Instead of waiting for a freelancer to deliver work, businesses and individuals can access AI-driven expertise instantly, at any time, and at a fraction of the cost of human labor.

这正是我们对新闻通讯代理系统的愿景:一种按需付费的AI驱动型服务,企业和个人用户只需按次付费即可生成完全由AI驱动、精心策划的新闻通讯。用户无需手动收集信息来源、撰写文章摘要和格式化内容,只需输入偏好设置,系统即可自动生成高质量的新闻通讯,随时可以分发。通过采用代理即服务(Agent-as-a-Service)模式,我们旨在将新闻通讯的创建过程转变为无缝、全自动的流程,每次使用都能创造价值,同时兼顾可扩展性和定制化。

This is exactly the vision we have for our newsletter agentic system: an on-demand AI-powered service where businesses and individuals pay per use to generate a fully curated, AI-driven newsletter edition. Instead of manually gathering sources, summarizing articles, and formatting content, users would simply input their preferences, and the system would autonomously create a high-quality newsletter ready for distribution. By adopting an Agent-as-a-Service model, we aim to turn newsletter creation into a seamless, fully automated process, delivering value with every use while allowing for scalability and customization.

人工智能代理市场的崛起

The Rise of AI Agent Marketplaces

或许最令人兴奋的发展趋势是形成结构化的生态系统,即人工智能代理市场,企业和个人可以在其中获得代理即服务。

Perhaps the most exciting development on the horizon is the formation of structured ecosystems in the form of AI agent marketplaces, where businesses and individuals can access Agent-as-a-service.

从本质上讲,AI代理市场是企业和个人可以提供或使用代理即服务(Agent-as-a-service)的数字平台。您可以将其视为AI员工的应用商店:集中式的平台,您可以在这里“雇佣”拥有专业技能的数字员工,涵盖内容创作、数据分析、软件开发和客户服务等各个方面。

At their core, AI agent marketplaces are digital platforms where businesses and individuals can offer or consume Agent-as-a-service. Think of them as app stores for AI workers: centralized hubs where you can “hire” digital employees with specialized skills, from content creation to data analysis, software development to customer service.

这些市场之所以拥有独一无二的强大力量,在于它们能够利用网络效应。企业与代理商之间的每一次互动都会产生数据,这些数据不仅能提升特定代理商的业绩,还有可能惠及整个生态系统。专注于相似领域的代理商可以互相学习经验,从而创造出不断累积的价值主张,这种价值主张对竞争对手而言越来越难以复制。率先在特定领域达到临界规模的市场,很可能建立起巨大的竞争优势。

What makes these marketplaces uniquely powerful is their ability to leverage network effects. Each interaction between a business and an agent creates data that improves not just that specific agent but potentially the entire ecosystem. Agents specializing in similar domains can learn from each other’s experiences, creating a compounding value proposition that becomes increasingly difficult for competitors to replicate. The marketplace that achieves critical mass first in a particular domain will likely establish a formidable advantage.

我们已经看到第一批此类市场平台涌现。像 Enso 这样的平台提供数百名专业的“AI 代理自由职业者”,他们能够以远低于人工成本的价格处理从 LinkedIn 内容撰写到 SEO 优化等各种工作。Velvet Home Goods创始人 Michael Chen 分享道:“我们为电商业务部署了 Enso 的营销代理,并在第一个月就实现了 40% 的用户互动率提升。最令人印象深刻的是,这些代理还在不断进步——他们每次参与营销活动都能更好地理解我们的品牌调性,这是我们通过一次性自由职业项目永远无法实现的。”

We’re already seeing the first wave of these marketplaces emerge. Platforms like Enso offer hundreds of specialized “AI agent freelancers” that handle everything from LinkedIn content writing to SEO optimization at a fraction of human costs.169 “We implemented Enso’s marketing agents for our e-commerce business and saw a 40% increase in engagement within the first month,” shares Michael Chen, founder of Velvet Home Goods. “What’s most impressive is that the agents keep improving—they understand our brand voice better with each campaign, something we’d never achieve with one-off freelance projects.”

Fiverr Go 正在扩展传统的自由职业者市场模式,将由其人类自由职业者训练的人工智能代理纳入其中。其他一些平台,例如 Taskade AI 和 Sourcegraph Cody,则专注于开发和编码方面的协助。有些平台提供单个代理服务,而另一些则提供由专业代理组成的“团队”,共同协作完成复杂任务。定价模式从订阅制到按绩效付费不等,但其核心价值主张始终如一:以远低于传统成本的价格,按需提供专业服务。

Fiverr Go is extending the traditional freelance marketplace model to include AI agents trained by their human gig workers.170 Others, like Taskade AI and Sourcegraph Cody, focus on development and coding assistance.171 Some offer individual agent services, while others provide “teams” of specialized agents that collaborate on complex tasks. The pricing models vary from subscription-based access to performance-based compensation, but the fundamental value proposition remains consistent: specialized capabilities available on demand, at a fraction of the traditional cost.

对于企业领导者和创业者而言,这些新兴市场不仅代表着获取能力的新途径,还可能带来新的商业模式和收入来源。我们曾与一些机构合作,这些机构基于自身的专有专业知识创建了专业代理,然后通过市场平台将这些代理变现——本质上是将他们的知识产权转化为能够持续产生收入的数字员工。这种方法使企业能够将影响力扩展到仅依靠人工顾问或服务提供商所能达到的程度。

For business leaders and entrepreneurs, these emerging marketplaces represent not just a new way to access capabilities but potentially new business models and revenue streams. We’ve worked with a few organizations that have created specialized agents based on their proprietary expertise, then monetized these agents through marketplace platforms—essentially turning their intellectual property into digital workers that generate continuous revenue. This approach allows businesses to scale their impact far beyond what would be possible with human consultants or service providers alone.

最具前瞻性的组织正在制定针对这些市场的战略方法——他们不仅将自身视为代理服务的消费者,也将其视为潜在的服务提供商。他们正在思考:我们拥有哪些专业知识可以编码到代理人身上并提供给其他机构?哪些独特的数据资产可以使我们的代理人比通用代理人更有价值?我们如何在自身行业或领域内创建专属的、针对特定代理人能力的微型市场?

The most forward-thinking organizations are developing strategic approaches to these marketplaces—not just as consumers of agent services but as potential providers. They’re asking: What specialized knowledge do we possess that could be encoded into agents and offered to others? What unique data assets could make our agents more valuable than generic alternatives? How might we create our own micro-marketplaces for specialized agent capabilities within our industry or domain?

无论您是希望获得以前无法企及的能力的小企业,还是希望将专有技术转化为新收入来源的企业,这些市场都代表着人工智能代理革命带来的最重要商业机遇之一。

Whether you’re a small business gaining access to capabilities previously beyond reach, or an enterprise turning proprietary expertise into new revenue streams, these marketplaces represent one of the most significant business opportunities of the AI agent revolution.

微型企业和去中心化模式

Micro-Enterprises and Decentralized Models

另一个正在兴起的趋势是“微型企业”的崛起,这些企业由人工智能驱动。这些规模小、高度专业化的企业利用人工智能代理以极少的人工干预运营。试想一下,一家一人公司运营着一家全球电子商务商店,人工智能代理负责处理从产品采购、客户服务到物流的一切事务。创业门槛正在降低,由此引发的创新浪潮挑战着传统的规模和层级观念。

Another trend gaining momentum is the rise of “micro-enterprises” powered by agentic AI. These are small, highly specialized businesses that leverage AI agents to operate with minimal human intervention. Imagine a one-person company running a global e-commerce store, with AI agents handling everything from sourcing products to customer service to logistics. The barriers to entrepreneurship are falling, creating a wave of innovation that challenges traditional notions of scale and hierarchy.

与此同时,大型企业正在探索由智能体人工智能驱动的去中心化运营模式。这些企业不再采用集中式决策,而是在业务的不同节点部署人工智能智能体。每个智能体都以半自主的方式运行,在其职责范围内做出决策,并将数据反馈给中央系统。这种去中心化模式提升了敏捷性,使企业能够以前所未有的速度应对市场变化。

In parallel, large enterprises are exploring decentralized operating models enabled by agentic AI. Instead of centralized decision-making, these organizations are deploying AI agents at different nodes of the business. Each agent operates semi-autonomously, making decisions within its domain while feeding data back to the central system. This decentralization fosters agility, enabling businesses to respond to market changes with unprecedented speed.

代理经济中的市场机遇

Market Opportunities in the Agent Economy

我们看到最有前景的机遇并非在于创造新的人工智能代理,而在于为新兴的代理经济开发基础设施和支持系统。正如电子商务的兴起为支付处理、物流和数字营销创造了机遇一样,代理经济也正在催生对新型服务的需求。

The most promising opportunities we’re seeing aren’t in creating new AI agents but in developing the infrastructure and support systems for the emerging agent economy. Just as the rise of e-commerce created opportunities in payment processing, logistics, and digital marketing, the agent economy is creating demands for new types of services.

我们看到以下方面尤其有潜力:

We’re seeing particular promise in:

代理编排平台可帮助企业协调多个人工智能代理。

Agent orchestration platforms that help businesses coordinate multiple AI agents

为现有代理商提供培训和优化服务

Training and optimization services for existing agents

面向代理驱动系统的安全和治理框架

Security and governance frameworks for agent-driven systems

帮助代理商使用遗留系统的集成服务

Integration services that help agents work with legacy systems

垂直行业人工智能代理:变革行业工作流程

Vertical AI Agents: Transforming Industry Workflows

垂直行业人工智能代理代表着特定行业工作流程的革命性飞跃,它们不再仅仅是工具,而是将整个运营团队压缩到软件中。与通用人工智能解决方案不同,这些代理高度专业化,能够深入理解并自主执行其领域内的任务。例如,专为软件开发中的质量保证而设计的垂直行业人工智能代理,不仅能够辅助质量保证团队,还能完全取代他们,自动执行测试、诊断缺陷并迭代修复,无需人工干预。

Vertical AI agents represent a revolutionary leap for specific industry workflows, where AI systems are not just tools but entire operational teams compressed into software. Unlike generic AI solutions, these agents are hyper-specialized and capable of deeply understanding and autonomously executing tasks within their domain. For instance, a vertical AI agent designed for quality assurance in software development doesn’t merely assist the QA team; it replaces it entirely, conducting automated tests, diagnosing bugs, and iterating on fixes without human oversight.

垂直人工智能代理的例子包括:

Examples of Vertical AI Agents include:

医疗账单代理:能够自主处理诊所医疗索赔的人工智能系统,减少错误并消除人为瓶颈。

Medical Billing Agents: AI systems that autonomously process medical claims for clinics, reducing errors and eliminating human bottlenecks.

客户支持代理:由人工智能驱动的支持系统,专为零售或科技等行业量身定制,通过深入的上下文理解解决查询。

Customer Support Agents: AI-powered support systems tailored to industries like retail or tech, resolving queries with deep contextual understanding.

政府合同代理:能够搜索数据库查找 RFP(征求建议书),自动起草回复并提交申请的系统。

Government Contracting Agents: Systems that scour databases for RFPs (Request for Proposals), autonomously draft responses, and submit applications.

这些智能体不仅取代了人力,还重新定义了企业的运营规模。如今,小型公司可以通过部署智能体,以远超其规模的能力与大型企业竞争。此外,它们通过让企业能够在无需大量额外成本的情况下进行优化、自动化和扩展,从而解锁新的收入来源。

These agents don’t just replace human effort; they redefine the scale at which businesses operate. Small companies can now compete with large enterprises by deploying agents that allow them to function at a disproportionate capacity. Moreover, they unlock new revenue streams by offering businesses the ability to optimize, automate, and scale without significant overhead.

企业家可以通过识别行业内高成本、重复性的流程,并建立专业的垂直行业代理商来开拓这些市场,从而获得成功。成本降低和效率提升带来的直接效益将推动快速普及,并重塑行业预期。

Entrepreneurs can find success by identifying high-cost, repetitive processes within industries and building specialized vertical agents to capture these markets. The immediate benefits—cost reduction and efficiency gains—will drive rapid adoption and reshape industry expectations.

人工智能驱动的生态系统平台

AI-Driven Ecosystem Platforms

智能体人工智能正在催生生态系统平台,这些平台如同数字枢纽,使企业、供应商和消费者能够无缝互动。这些平台利用人工智能代理自主管理诸如买卖双方匹配、合同谈判和合规性监控等任务。例如,在物流领域,生态系统平台可以将托运人和承运人连接起来,动态协商运费,并实时优化路线。在创意产业,这些平台可以将艺术家、品牌和消费者聚集在一起,由人工智能处理合作和版税事宜。这种互联互通的方式促进了跨行业的创新、效率和协作。

Agentic AI is giving rise to ecosystem platforms that function as digital hubs where businesses, suppliers, and consumers interact seamlessly. These platforms leverage AI agents to autonomously manage tasks such as matchmaking between buyers and sellers, contract negotiations, and compliance monitoring. For instance, in logistics, an ecosystem platform might connect shippers with carriers, dynamically negotiate rates, and optimize routes in real-time. In creative industries, these platforms can bring together artists, brands, and consumers, with AI handling collaborations and royalties. This interconnected approach fosters innovation, efficiency, and collaboration across sectors.

新市场和数字伴侣

New Markets and Digital Companions

智能体人工智能带来的最深刻变革之一,便是催生了全新的市场。以“数字伴侣”为例,这些人工智能实体并非简单的聊天机器人,而是能够与用户建立有意义关系的自适应、情境感知型智能体。数字伴侣的应用领域涵盖心理健康支持、老年护理,甚至个人发展指导。在日益数字化的世界中,这些智能体或许将重新定义“陪伴”的概念。

One of the most profound shifts brought by agentic AI is the emergence of entirely new markets. Take the concept of “digital companions,” for example. These AI entities are not mere chatbots; they are adaptive, context-aware agents capable of forming meaningful relationships with users. Digital companions are finding applications in mental health support, elder care, and even personal development coaching. These agents could potentially redefine the very notion of companionship in an increasingly digital world.

代理间经济:构建下一代经济平台

The Agent-to-Agent Economy: Building the Next Economic Platform

想象一下,你或你的公司拥有一个数字化版本,可以代表你周旋于世间,做出决策,处理谈判,并以与你相同的细致入微和深刻理解来管理你的日常生活。无论代表公司还是个人,这些智能体都能自主地相互谈判,从而创造出一种全新的经济活动模式,它以机器的速度运行,同时又尊重人类设定的界限。这并非科幻小说——我们称之为“智能体间经济”,它代表着智能体人工智能领域最具变革性的机遇之一。

Imagine having a digital version of yourself or your company that could navigate the world on your behalf, making decisions, handling negotiations, and managing your daily life with the same nuance and understanding that you would. Whether representing a company or individuals, these agents would negotiate with each other autonomously, creating a new layer of economic activity that operates at machine speed while respecting human-defined boundaries. This isn’t science fiction—it’s what we call the Agent-to-Agent Economy, and it represents one of the most transformative opportunities in agentic AI.

在最近与一家制造公司合作的实施项目中,我们见证了一件令人瞩目的事情。起初,我们部署了智能代理来处理采购,并设定了自动采购的特定阈值。但我们始料未及的是,这些智能代理会开始优化与其他公司系统的交互,从而发现人工买卖双方都忽略的效率提升点。

During a recent implementation project with a manufacturing company, we witnessed something remarkable. We had initially deployed agents to handle procurement, setting specific thresholds for automatic purchases. What we didn’t expect was how these agents would begin optimizing their interactions with other companies’ systems, finding efficiencies that human buyers and sellers had missed.

这揭示了真正的机遇:不仅是为个人或企业创建个体代理,而且是构建一个完整的经济生态系统,在这个生态系统中,数字代理可以根据预先定义的参数自主进行交易和互动。

This revealed the true opportunity: not just in creating individual agents for people or businesses, but in building an entire economic ecosystem where digital agents can transact and interact autonomously based on pre-defined parameters.

想象一下,您经营一家公司,需要不断采购原材料,例如零部件,并且必须满足特定的条件——特定的价格门槛、交货时间和质量标准。一旦您公司的AI代理理解了这些要求,它就能自主识别符合您标准的供应商,协商条款,甚至代表您完成交易。以往需要人工团队进行供应商调研、多次邮件沟通或电话沟通以及最终签订合同的流程,现在都可以由您的数字代理无缝完成。这使您的公司能够以前所未有的速度和效率运营,从而解放人力。人力资源部门应专注于战略任务,而不是后勤琐事。

Imagine running a company where you’re constantly in need of materials, such as raw components, under specific conditions—certain price thresholds, delivery timelines, and quality standards. Once your company’s AI agent understands these requirements, it can autonomously identify vendor agents that meet your criteria, negotiate terms, and even execute transactions on your behalf. The process, which would typically require human teams to research suppliers, exchange multiple emails or calls, and finalize contracts, is now handled seamlessly by your digital representative. This allows your company to operate at unprecedented speed and efficiency, freeing up human resources to focus on strategic tasks rather than logistical minutiae.

在个人层面,同样的道理也适用。想象一下,你的私人人工智能助手可以帮你处理日常购物和财务决策。如果你想买菜,你的助手可以和批发商谈判,确保以最优惠的价格买到符合你饮食偏好的商品。它还可以审核你的汽车保险单,并与保险公司协商更优惠的条款,帮你省钱省时。即使是寻找梦寐以求的服装,你的助手也能在全球范围内搜索,找到远在世界另一端的小商家,找到完全符合你心意的款式,确保购买过程轻松便捷,完全满足你的愿望。

On a personal level, the same concept applies. Imagine your personal AI agent handling everyday purchases and financial decisions. If you’re looking to buy groceries, your agent could negotiate with wholesale vendors to secure the best prices while ensuring the items meet your dietary preferences. It could also review your car insurance policy and negotiate better terms with insurers, saving you both money and time. Even in scenarios like finding your dream outfit, your agent could scour the globe, identifying a small vendor on the other side of the world who crafts exactly what you’re looking for, ensuring the purchase is effortless and tailored to your desires.

另外值得一提的是,这种交易方式的转变不仅让生活更加便捷,也彻底重塑了营销格局,这一点也令人着迷。当个人和商业代理人充当中间人时,公司需要说服购买其产品的不再是消费者,而是代理人本身。

On a side note, it is also fascinating to understand that this shift in how transactions occur doesn’t just make life more convenient; it also completely reinvents the landscape of marketing. When personal and business agents act as intermediaries, the entities that companies need to convince to buy their products are no longer human consumers, but the agents themselves.

真正的变革机遇在于构建能够让这些代理无缝交互的基础设施。这可以比作构建代理经济的“操作系统”——一个促进代理间交互的沟通、交易和治理的平台。该生态系统的基础需要标准化的代理通信协议、用于自动执行交易的智能合约,以及强大的信誉、信任、支付和结算系统。例如,智能合约框架可以实现供应链协议的自动化,确保只有在满足交付条件后才能支付款项。信誉系统则可以帮助代理在进行交易前评估对方的信誉度。

The real transformative opportunity lies in creating the infrastructure that enables these agents to interact seamlessly. This can be likened to building the “operating system” for the agent economy—a platform that facilitates communication, transactions, and governance for agent-to-agent interactions. The foundation of this ecosystem requires standardized protocols for agent communication, smart contracts for automated deal execution, and robust systems for reputation, trust, payment, and settlement. For example, smart contract frameworks could automate supply chain agreements, ensuring payments are released only when delivery conditions are met. Reputation systems would help agents assess trustworthiness before engaging in transactions.

对于希望从这场变革中获利的企业家来说,关键在于尽早定位,专注于自主代理能够立即创造价值的特定垂直领域,例如供应链优化、房地产或金融服务。通过构建治理框架来建立信任,确保透明度、数据隐私和人工监督,同时为企业和个人创建工具,以便他们能够无缝地采用和管理代理。真正的机遇不仅在于开发代理,更在于建立支撑其交互的基础设施——平台、协议和生态系统,这些都将是代理经济的基石。现在就采取行动,企业家们就能在这个变革性的市场中占据领先地位。

For entrepreneurs looking to capitalize on this revolution, the key is to position yourself early by focusing on specific verticals where autonomous agents can deliver immediate value, such as supply chain optimization, real estate, or financial services. Start by building trust through governance frameworks that ensure transparency, data privacy, and human oversight while creating tools for businesses and individuals to adopt and manage agents seamlessly. The real opportunity lies not just in developing agents but in establishing the infrastructure that powers their interactions—platforms, protocols, and ecosystems that will underpin the agent economy. By acting now, entrepreneurs can secure a leadership position in this transformative market.

加密货币领域智能人工智能的崛起:一种新的范式

The Rise of Agentic AI in Cryptocurrencies: A New Paradigm

想象一下,人工智能不仅分析加密货币市场,还能积极参与其中,自主决策,甚至成为百万富翁。这并非科幻小说,而是在2024年3月就已发生的现实,它正在彻底改变我们对人工智能和加密货币未来的认知。

Imagine a world where artificial intelligence doesn’t just analyze cryptocurrency markets—it actively participates in them, makes its own decisions, and even becomes a millionaire. This isn’t science fiction; it happened in March 2024, and it’s transforming how we think about the future of both AI and cryptocurrency.

2024年7月,由开发者安迪·艾瑞(Andy Ayrey)创建的名为“真理终端”(Truth Terminal)的人工智能开始在X平台上发布内容。这款半自主运行的人工智能分享了幽默、存在主义和挑衅性的内容,迅速积累了超过20万的粉丝。172

In July 2024, an AI named Truth Terminal, created by developer Andy Ayrey, began posting on X. This AI, operating semi-autonomously, shared a mix of humorous, existential, and provocative content, quickly amassing over 200,000 followers.172

风险投资家马克·安德森(Marc Andreessen)被其独特的交互方式所吸引,与Truth Terminal展开了合作。经过一系列交流,安德森向该人工智能提供了价值5万美元的比特币资助,旨在探索自主人工智能代理在金融市场中的潜力。173

Intrigued by its unique interactions, venture capitalist Marc Andreessen engaged with Truth Terminal. After a series of exchanges, Andreessen provided a $50,000 grant in Bitcoin to the AI, aiming to explore the potential of autonomous AI agents in financial markets.173

随后,一位匿名开发者推出了一种名为 Goatseus Maximus ($GOAT) 的基于表情包的加密货币,其灵感来源于 Truth Terminal 的内容。人工智能的推广$GOAT 的股价飙升,使其市值在几天内达到约 1.5 亿美元。174

Following this, an anonymous developer launched a meme-based cryptocurrency called Goatseus Maximus ($GOAT), inspired by Truth Terminal’s content. The AI’s promotion of $GOAT led to a surge in its market value, reaching approximately $150 million within days.174

因此,Truth Terminal的加密货币持有量显著增长,使其成为加密货币领域首批跻身百万富翁行列的人工智能实体之一。这一发展引发了关于人工智能在金融市场中的伦理影响和未来角色的讨论。175

As a result, Truth Terminal’s cryptocurrency holdings grew significantly, making it one of the first AI entities to achieve millionaire status in the crypto world. This development has sparked discussions about the ethical implications and future roles of AI in financial markets.175

这一非凡的故事凸显了智能体人工智能(即自主、目标导向的人工智能系统)在加密生态系统中创新和动态互动的强大力量。“真理终端”项目证明,人工智能不仅可以创造和影响文化叙事,还能积极参与并塑造金融市场。对于企业家和技术人员而言,这个故事展现了人工智能体与加密货币形成共生关系的未来图景,这种共生关系将推动创新并挑战传统体系。

This remarkable saga highlights the power of agentic AI—autonomous, goal-oriented AI systems—to innovate and interact dynamically within the crypto ecosystem. The Terminal of Truths demonstrated that AI could not only create and influence cultural narratives but also actively participate in and shape financial markets. For entrepreneurs and technologists, this story offers a glimpse into a future where AI agents and cryptocurrencies form symbiotic relationships, driving innovation and challenging traditional systems.

智能体人工智能与加密货币之间独特的协同效应

The Unique Synergy Between Agentic AI and Cryptocurrencies

智能体人工智能与加密货币可谓天作之合,二者优势互补,相得益彰。加密货币提供了一种去中心化、无需许可的金融基础设施,与人工智能代理的自主特性完美契合。与需要人工身份验证和遵守监管框架的传统金融体系不同,区块链技术使人工智能代理能够独立进行金融交易,从而实现无需人工干预的实时跨境活动。

Agentic AI and cryptocurrencies are a natural fit, each complementing the other’s strengths while addressing its limitations. Cryptocurrencies provide a decentralized, permissionless financial infrastructure that aligns with the autonomous nature of AI agents. Unlike traditional financial systems that require human identification and adherence to regulatory frameworks, blockchain technology allows AI agents to engage in financial transactions independently, enabling real-time, cross-border activities without human intervention.

智能体人工智能通过为区块链交互引入智能和适应性,增强了加密货币的实用性。这些人工智能系统可以:

Agentic AI enhances the utility of cryptocurrencies by bringing intelligence and adaptability to blockchain interactions. These AI systems can:

自主执行复杂交易或管理去中心化金融 (DeFi) 投资组合,从而实现更高效、数据驱动的投资策略,并最大限度地减少人为干预。

Execute complex trades or manage decentralized finance (DeFi) portfolios autonomously, enabling more efficient and data-driven investment strategies with minimal human oversight.

管理代币生态系统,在去中心化自治组织 (DAO) 中扮演公平高效的调解人角色,确保平衡的决策和公平的参与。

Govern token ecosystems, acting as fair and efficient mediators in decentralized autonomous organizations (DAOs), ensuring balanced decision-making and equitable participation.

促进微支付,优化代币化平台的经济模型,为各种应用场景提供无缝、低成本的交易。

Facilitate micropayments and optimize economic models for tokenized platforms, providing seamless, low-cost transactions for a variety of use cases.

这种协同作用为创建去中心化系统带来了前所未有的可能性,这些系统不仅高效,而且具有适应性和弹性。

This synergy unlocks unprecedented possibilities for creating decentralized systems that are not only efficient but also adaptive and resilient.

为什么智能体人工智能和加密技术能够如此完美地结合在一起

Why Agentic AI and Crypto Work So Well Together

把传统银行业想象成一座处处设有严格检查站的城市——每一步都需要出示身份证、填写表格并等待人工审批。现在,把加密货币想象成一座没有检查站、系统自动化的城市——只要你拥有正确的数字密钥,就可以自由通行。这就是为什么像ToT这样的AI代理能够如此自然地与加密货币协同工作。

Think of traditional banking as a city with strict checkpoints everywhere—you need to show ID, fill out forms, and wait for human approval at every turn. Now imagine cryptocurrency as a city with automated systems instead of checkpoints—as long as you have the right digital keys, you can move freely. This is why AI agents like ToT work so naturally with cryptocurrency.

让我们来分析一下为什么这种合作关系如此有意义:

Let’s break down why this partnership makes so much sense:

首先,加密货币就像数字乐高积木——它们可以移动、组合和构建,无需人工批准每一步。而传统银行可能需要几天时间。处理国际转账通常需要多人审核和批准。但在加密货币领域,只要拥有相应的数字权限,人工智能代理就能在几秒钟内转移数百万美元。

First, cryptocurrencies work like digital Lego blocks—they can be moved, combined, and built upon without needing a human to approve each step. Traditional banks might take days to process an international transfer, requiring multiple people to review and approve it. But in the crypto world, an AI agent can move millions of dollars in seconds, as long as it has the correct digital permissions.

其次,所有加密货币交易都会记录在一个公共账本(区块链)上,就像一个巨大的、透明的电子表格,任何人都可以查看。这使得人工智能代理能够随时完美地了解市场动态。想象一下,你拥有一个水晶球,可以实时显示每一笔金融交易——这就是人工智能代理在加密货币世界中所能获取的信息。

Second, all cryptocurrency transactions are recorded on a public ledger (the blockchain), similar to a giant, transparent spreadsheet that everyone can see. This gives AI agents a perfect view of what’s happening in the market at all times. Imagine having a crystal ball that shows every financial transaction happening in real-time—that’s what AI agents have access to in the crypto world.

第三,加密货币可以被编程设定特定的规则和条件(称为智能合约),并在满足特定条件时自动执行。这就像一台自动售货机,它不仅能收钱、卖零食,还能自动补货、根据需求调整价格,甚至自动订购新货。人工智能代理可以与这些智能合约交互,在无需人工干预的情况下创建复杂的金融安排。

Third, cryptocurrencies can be programmed with specific rules and conditions (called smart contracts) that automatically execute when certain conditions are met. It’s like having a vending machine that not only accepts money and gives you snacks but can also restock itself, adjust prices based on demand, and even order new inventory automatically. AI agents can interact with these smart contracts to create complex financial arrangements without human intervention.

智能体人工智能与加密技术的交叉领域蕴藏着诸多机遇

Opportunities at the Intersection of Agentic AI and Crypto

个性化金融服务

Personalized Financial Services

智能体人工智能可以重新定义个人和机构管理财务的方式。例如,人工智能钱包可以自主分配投资、平衡投资组合并精准执行交易。daos.fun 等平台就充分展现了这种能力,该平台利用人工智能代理管理对冲基金,通过数据分析和全天候运营来提升财务业绩。这些代理消除了人为偏见和低效,能够满足独特的财务需求,提供实时调整的定制化方案。

Agentic AI can redefine how individuals and institutions manage their finances. For example, AI-powered wallets can autonomously allocate investments, balance portfolios, and execute transactions with precision. This capability is exemplified by platforms like daos.fun, where AI agents manage hedge funds, leveraging data analytics and 24/7 operational capabilities to improve financial outcomes. By removing human biases and inefficiencies, these agents can cater to unique financial needs, offering a tailored approach that adapts in real-time.

标记化人工智能代理

Tokenized AI Agents

Virtuals.io 等平台凸显了一种新兴机遇:人工智能代理本身成为代币化资产。这不仅使用户能够与人工智能互动,还能分享其运营成功的成果。代币持有者可以影响这些代理的开发和管理,从而为创新和参与创造经济激励。例如,一个人工智能艺术家代理可以创作数字艺术作品,而其代币持有者则可以从作品的受欢迎程度和销售额中获得经济收益,从而促进一种新型的协作所有权和创造力模式。

Platforms like Virtuals.io highlight an emerging opportunity where AI agents themselves become tokenized assets. This allows users to not only interact with AI but also hold stakes in their operational success. Token holders can influence the development and governance of these agents, creating an economic incentive for innovation and engagement. For instance, an AI artist agent could generate digital art while its token holders benefit financially from its popularity and sales, fostering a new form of collaborative ownership and creativity.

去中心化市场

Decentralized Marketplaces

通过将智能体人工智能集成到去中心化市场中,房地产、自由职业服务和供应链管理等行业可以实现前所未有的效率。人工智能代理可以自主协商合同、优化资源分配并简化物流,从而降低成本和人为错误。例如,去中心化电子商务平台中的人工智能代理可以根据供需趋势动态调整价格和库存,从而创建一个无缝高效的交易环境。

By integrating agentic AI into decentralized marketplaces, industries like real estate, freelance services, and supply chain management can achieve unprecedented efficiency. AI agents can autonomously negotiate contracts, optimize resource allocation, and streamline logistics, reducing costs and human error. For example, an AI agent in a decentralized e-commerce platform could dynamically adjust pricing and inventory based on demand and supply trends, creating a seamless and efficient trading environment.

智能基础设施

Smart Infrastructure

智能体人工智能与区块链相结合,有望彻底改变基础设施管理。在智慧城市中,人工智能代理可以自主监测能源消耗、管理交通系统,并基于实时数据优化公共服务。通过自动化这些流程,城市可以显著提升可持续性和效率,充分展现将智能融入基础设施的深远影响。

Agentic AI, when combined with blockchain, has the potential to transform infrastructure management. In smart cities, AI agents can autonomously monitor energy consumption, manage traffic systems, and optimize public services based on real-time data. By automating these processes, cities can achieve significant improvements in sustainability and efficiency, demonstrating the profound impact of integrating intelligence into infrastructure.

在智能体人工智能经济中创造机遇:新应用淘金热

Building Opportunities in the Agentic AI Economy: The New App Gold Rush

回想一下2008年,苹果公司首次推出App Store的时候。大多数人当时只是把它当作在手机上下载游戏和实用工具的途径。很少有人意识到,这竟是一场革命的开端,这场革命将创造数十亿美元的价值,并彻底改变整个行业。如今,我们在智能人工智能领域也正处于类似的转折点。

Think back to 2008, when Apple first launched the App Store. Most people saw it as just a way to get games and simple utilities on their phones. Few recognized it as the beginning of a revolution that would create billions in value and transform entire industries. Today, we’re at a similar inflection point with agentic AI.

正如应用程序已成为人与移动计算之间的接口一样,人工智能代理也正在成为人与复杂业务流程之间的接口。但是,如何才能发现代理经济领域的下一个“优步”或“Ins​​tagram”呢?通过我们在数百家机构实施人工智能解决方案的经验,我们开发了一套系统化的方法来识别这些机遇。

Just as apps have become the interface between humans and mobile computing, AI agents are becoming the interface between humans and complex business processes. But how do you spot the next “Uber” or “Instagram” of the agent economy? Through our work implementing AI solutions across hundreds of organizations, we’ve developed a systematic approach to identifying these opportunities.

理解新范式

Understanding the New Paradigm

在最近的一次咨询项目中,一位客户问了我们一个很有意思的问题:“如果人工智能代理是新一代的应用程序,那么智能手机的对应物是什么?” 答案揭示了在这个领域发现机遇的关键所在。人工智能代理的“平台”并非实体设备,而是企业或行业的整个数字化基础设施。

During a recent consulting engagement, a client asked us an intriguing question: “If AI agents are the new apps, what’s the equivalent of the smartphone?” The answer revealed a crucial insight about opportunity spotting in this space. The “platform” for AI agents isn’t a physical device—it’s the entire digital infrastructure of a business or industry.

以我们去年合作过的创业者塔姆拉为例。她最初找到我们是因为她想开发一个人工智能代理来管理社交媒体。然而,在我们帮助她分析市场机遇的过程中,她意识到更大的潜力在于创建一个能够协调多种现有社交媒体工具和服务的代理。如今,她成功的业务并非建立在单一代理之上,而是通过协调多个专业代理来提供全面的社交媒体管理服务。

Take Tamra, an entrepreneur we worked with last year. She initially approached us because she wanted to build an AI agent to handle social media management. However, as we helped her analyze the opportunity, she realized that the bigger potential lay in creating an agent that could coordinate multiple existing social media tools and services. Her successful business today isn’t built on a single agent, but on orchestrating multiple specialized agents to deliver comprehensive social media management services.

三大机遇领域

The Three Horizons of Opportunity

图像

图 9.1:能动机会的三个阶段(来源:© Bornet 等人)

Figure 9.1: The Three Horizons of Agentic Opportunity (Source: © Bornet et al.)

根据我们的经验,我们总结出了智能体人工智能领域通常会出现机遇的三个不同阶段:

Through our experience, we’ve identified three distinct horizons where opportunities in agentic AI typically emerge:

第一个阶段是优化现有流程。这些是“唾手可得”的机会,代理商可以通过自动化或增强现有业务运营来实现。虽然这些机会看似不那么令人兴奋,但它们往往能最快地创造价值,也是进入市场的最便捷途径。

The first horizon involves enhancing existing processes. These are the “low-hanging fruit” opportunities where agents can automate or augment current business operations. While these might seem less exciting, they often provide the quickest path to value and the easiest entry point into the market.

第二个发展方向是运用以智能体为先导的视角重新构想现有服务。智能体正是在此大放异彩,它们能够协调复杂的流程,以全新的方式提供传统服务。还记得我们餐厅的二维码实验吗?那正是这一发展方向的完美例证——它满足了现有的需求(例如电子菜单),并利用智能体人工智能的能力对其进行重新构想。

The second horizon involves reimagining existing services through an agent-first lens. This is where agents shine, orchestrating complex workflows to deliver traditional services in radically new ways. Remember our restaurant’s QR code experiment? That’s a perfect example of this horizon—taking an existing need (digital menus) and reimagining it through the capabilities of agentic AI.

第三个领域或许最令人兴奋:人工智能出现之前无法想象的全新产品和服务类别。这些机遇通常出现在多种趋势和技术的交汇点。虽然这些机遇可以它们虽然能带来最大的潜在回报,但也伴随着最高的风险,并且需要对技术的性能和局限性有最深刻的了解。

The third horizon is perhaps the most exciting: entirely new categories of products and services that weren’t possible before agentic AI. These opportunities typically emerge at the intersection of multiple trends and technologies. While these can offer the biggest potential returns, they also carry the highest risk and require the deepest understanding of the technology’s capabilities and limitations.

主动机会识别框架

The Agentic Opportunity Identification Framework

经过反复试验,我们开发了一套用于识别和评估智能体人工智能领域商业机会的综合框架。我们称之为智能体机会识别框架(AOIF),它从四个关键维度来审视各种机会。接下来,我们将详细探讨每个维度。

Through trial and error, we’ve developed a comprehensive framework for identifying and evaluating business opportunities in the agentic AI space. We call it the Agentic Opportunity Identification Framework (AOIF), and it examines opportunities through four critical dimensions. Let’s explore each dimension in detail.

1. 价值链分析

1. Value Chain Analysis

这一维度着重探讨智能体人工智能可以在哪些方面以及如何变革业务流程。每个组成部分都包含一个关键问题,以指导您的分析:

This dimension focuses on identifying where and how agentic AI can transform business processes. Each component comes with a key question to guide your analysis:

任务分解:这指的是将复杂的工作流程分解成更小、更易于管理的任务,这些任务可以通过人工智能进行增强或自动化。例如,在内容创作过程中,可以将流程分解为构思、研究、撰写、编辑和优化。每个环节都为人工智能的增强提供了独特的机会。

Task Decomposition: This involves breaking down complex workflows into smaller, manageable tasks that can be enhanced or automated by AI. For example, in content creation, the process can be broken down into ideation, research, drafting, editing, and optimization. Each of these components presents unique opportunities for AI augmentation.

关键问题:“在您的流程中,哪些离散步骤可以彼此独立地实现自动化或改进?”

Key Question: “What are the discrete steps in your process that could be automated or enhanced independently of each other?”

交接点:这些是工作在不同参与方或系统之间交接的关键节点。根据我们实施人工智能解决方案的经验,这些交接点往往蕴藏着最大的改进机会。例如,客户支持工单从自动初始响应到人工客服再到专业部门的整个流程——每一次交接都蕴藏着人工智能增强的潜力。

Handoff Points: These are the critical junctures where work transitions between different parties or systems. In our experience implementing AI solutions, these handoff points often represent the greatest opportunities for improvement. Consider how a customer support ticket moves from automated initial response to human agent to specialized department—each transition is a potential opportunity for AI enhancement.

关键问题:“当工作在不同人员或系统之间传递时,通常会在哪些环节出现延误或错误?”

Key Question: “Where do delays or errors typically occur when work moves between different people or systems?”

决策节点:这些是流程中必须根据现有信息做出选择的关键点。人工智能擅长处理复杂的决策场景,尤其是在需要同时考虑多个变量的情况下。例如,在供应链管理中,确定最佳库存水平需要平衡众多因素,而人工智能处理这些因素的效率远高于人类。

Decision Nodes: These are points in a process where choices must be made based on available information. AI excels at handling complex decision-making scenarios, especially when multiple variables must be considered simultaneously. For instance, in supply chain management, deciding optimal inventory levels requires balancing numerous factors that AI can process more effectively than humans.

关键问题:“在您的决策过程中,哪些决策需要同时分析多个数据点或变量?”

Key Question: “What decisions in your process require analyzing multiple data points or variables simultaneously?”

2. 市场痛点矩阵

2. Market Pain Point Matrix

这一维度有助于识别人工智能具有独特优势去解决的现有问题:

This dimension helps identify existing problems that AI is uniquely positioned to solve:

摩擦点:这些是当前流程中容易造成挫败感、延误或效率低下的地方。我们发现,成功的AI应用通常会首先解决这些摩擦点。例如,律师事务所的文档处理经常会出现严重的瓶颈,而AI可以有效地解决这些问题。

Friction Areas: These are points where current processes create frustration, delays, or inefficiencies. We’ve found that successful AI implementations often target these friction points first. For example, document processing in legal firms often creates significant bottlenecks that AI can effectively address.

关键问题:“你的团队成员经常抱怨或试图逃避哪些任务?”

Key Question: “What tasks do your team members consistently complain about or try to avoid?”

成本中心:这些领域的运营必不可少,但成本高昂。人工智能通常可以在保持甚至提升质量的同时,显著降低成本。其核心在于利用智能体自动化执行高成本、劳动密集型的操作,从而提高利润率。例如,可以考虑人工智能辅助编码如何在降低开发成本的同时提升代码质量。

Cost Centers: These are areas where operations are necessary but expensive. AI can often dramatically reduce costs while maintaining or improving quality. It is about leveraging agents to automate high-cost, labor-intensive operations, improving margins. Consider how AI-assisted coding can reduce development costs while improving code quality.

关键问题:“哪些流程消耗的预算与其价值不成比例?”

Key Question: “Which processes consume a disproportionate amount of your budget relative to their value?”

质量差距:这些领域是指现有解决方案无法始终如一地达到预期标准的方面。人工智能能够保持稳定的性能,因此是弥补这些差距的理想选择。例如,在医学影像领域,人工智能可以提供一致的初步筛查结果,从而减少人为错误。

Quality Gaps: These are areas where current solutions fail to meet desired standards consistently. AI’s ability to maintain consistent performance makes it ideal for addressing these gaps. For instance, in medical imaging, AI can provide consistent initial screenings that reduce human error.

关键问题:“您认为您目前的流程中,质量或性能方面最大的差异在哪里?”

Key Question: “Where do you see the most variation in quality or performance in your current processes?”

3. AI代理能力匹配

3. AI Agent Capability Alignment

这一维度确保您的解决方案能够有效利用人工智能的优势。了解这些能力对于识别可行的机会至关重要:

This dimension ensures that your solution leverages AI’s strengths effectively. Understanding these capabilities is crucial for identifying feasible opportunities:

语言理解:这项能力使人工智能能够处理和生成人类语言,使其成为处理沟通、文档编写、文本信息摘要或分析等任务的理想选择。现代语言模型可以处理从简单的分类到复杂的推理和生成等各种任务。

Language Understanding: This capability enables AI to process and generate human language, making it ideal for tasks involving communication, documentation, summarization or analysis of text-based information. Modern language models can handle tasks ranging from simple classification to complex reasoning and generation.

关键问题:“你们的哪些流程严重依赖于阅读、写作或解释文本?”

Key Question: “Which of your processes rely heavily on reading, writing, or interpreting text?”

模式识别:这是人工智能识别数据中趋势和关联性的能力,使其非常适合用于预测性维护、欺诈检测或市场分析。模式识别能力不断发展,能够处理各种数据类型中日益复杂和微妙的模式。

Pattern Recognition: This is AI’s ability to identify trends and correlations in data, making it perfect for predictive maintenance, fraud detection, or market analysis. Pattern recognition capabilities have evolved to handle increasingly complex and subtle patterns across various data types.

关键问题:“识别数据中的模式或趋势能在哪些方面提供显著价值?”

Key Question: “Where could identifying patterns or trends in your data provide significant value?”

推理链:这涉及人工智能遵循逻辑步骤并建立联系的能力,这对解决复杂问题(例如法律分析或医疗诊断)至关重要。现代人工智能系统能够保持上下文关联并遵循多步骤逻辑流程。

Reasoning Chains: This involves AI’s ability to follow logical steps and make connections, which is crucial for complex problem-solving tasks like legal analysis or medical diagnosis. Modern AI systems can maintain context and follow multi-step logical processes.

关键问题:“哪些决策需要遵循特定的逻辑步骤?”

Key Question: “What decisions require following a specific sequence of logical steps?”

4. 整合机会

4. Integration Opportunities

这一维度着重探讨如何将新的人工智能解决方案融入现有系统。成功往往取决于与现有工作流程的无缝集成:

This dimension focuses on how new AI solutions can fit into existing systems. Success often depends on seamless integration with current workflows:

现有工具:与其替换现有工具,不如寻找机会加以改进,这通常能带来更快的采用速度和更好的效果。思考一下您的 AI 解决方案如何能够增强 Salesforce、Microsoft Office 或行业专用软件等常用平台的功能。

Existing Tools: Identifying opportunities to enhance rather than replace current tools often leads to faster adoption and better results. Consider how your AI solution can augment popular platforms like Salesforce, Microsoft Office, or industry-specific software.

关键问题:“哪些软件工具对您目前的运营至关重要?”

Key Question: “What software tools are central to your current operations?”

数据可用性:易于获取的数据能够加速人工智能代理的部署并提升其效能。评估现有数据及其可访问性有助于找到人工智能实施的“唾手可得”的切入点。请同时考虑组织内部的结构化和非结构化数据源。

Data Availability: Readily available data accelerates AI agent deployment and effectiveness. Assessing what data is already available and accessible can help identify low-hanging fruit for AI implementation. Consider both structured and unstructured data sources within your organization.

关键问题:“您目前收集了哪些数据,但尚未充分利用?”

Key Question: “What data are you already collecting but not fully utilizing?”

API 生态系统:轻松集成到强大的生态系统中可加速扩展。了解您的解决方案可以接入哪些现有平台,可以显著缩短开发时间并扩大市场覆盖范围。现代企业依赖于互联系统,这为人工智能集成创造了机遇。

API Ecosystems: Easy integration into robust ecosystems accelerates scalability. Understanding where your solution can plug into existing platforms can dramatically reduce development time and increase market reach. Modern businesses rely on interconnected systems, creating opportunities for AI integration.

关键问题:“您所在行业中哪些平台拥有强大的 API 生态系统?”

Key Question: “Which platforms in your industry have robust API ecosystems?”

从理论到实践:通讯代理人的故事

From Theory to Practice: The Newsletter Agent Story

让我们分享一个最近的例子,它完美地说明了如何利用该框架在智能人工智能领域发现和抓住机遇。

Let us share a recent example that perfectly illustrates how to spot and seize opportunities in the agentic AI landscape by using the framework.

几个月前,我们发现内容创作者和小企业主普遍面临一个痛点:创建、编辑和分发电子报刊的过程非常耗时。虽然市面上有很多工具可以帮助完成这些步骤,但没有一款工具能够独立完成所有步骤(内容编辑、摘要、格式化、审核和发布)。

A few months ago, we identified a common pain point among content creators and small business owners: the time-consuming process of creating, curating, and distributing newsletters. While numerous tools existed to help with this process, no single tool on the market could perform all the steps (curation, summarization, formatting, reviewing, and publishing) autonomously.

我们看到了一个创造全新产品的机会:一个能够在我们智能体人工智能框架的第三层运行的自主新闻通讯代理。这个代理不仅能自动发送新闻通讯,还能自主筛选内容、撰写引人入胜的摘要,甚至根据读者的参与模式优化发送时间。

We saw an opportunity to create something different: an autonomous newsletter agent that could operate at Level 3 of our agentic AI framework. This agent wouldn’t just automate the process of sending newsletters—it would autonomously curate content, write engaging summaries, and even optimize sending times based on reader engagement patterns.

这个机会之所以特别吸引人,是因为它符合我们对有前景的代理商业务的所有三个关键标准:

What made this opportunity particularly compelling was that it hit all three of our key criteria for a promising agent-based business:

1.它解决了一个明显的痛点(耗时的内容策划和创作)。

1. It addressed a clear pain point (time-consuming content curation and creation)

2.它能够在极少人工干预的情况下自主运行

2. It could operate autonomously with minimal human oversight

3.它可以高效扩展,同时为多个客户提供服务。

3. It could scale efficiently, serving multiple clients simultaneously

我们开发的代理程序可以扫描指定来源的相关内容,以客户的品牌声音撰写引人入胜的摘要,并自动编译和发送新闻简报。

The agent we built could scan specified sources for relevant content, write engaging summaries in the client’s brand voice, and automatically compile and send newsletters.

关键的洞见不仅在于发现了技术上的可行性,更在于认识到如何将这种能力包装成可扩展的商业模式。我们不会将代理作为产品出售,而是将其作为服务提供:一种兼具人工质量和机器级一致性及规模的自主新闻通讯管理方案。

The key insight wasn’t just in identifying the technical possibility—it was in recognizing how this capability could be packaged as a scalable business model. Instead of selling the agent as a product, we will offer it as a service: autonomous newsletter management with human-level quality but machine-level consistency and scale.

第一步:识别机会

Step 1: Opportunity identification

我们创建新闻通讯代理系统的旅程始于一系列密集的研讨会,旨在发掘和验证人工智能代理的商业机会。我们召集了一支多元化的团队,成员包括内容创作者、商业领袖和技术专家,开展为期两周的结构化探索流程。我们的第一节课是一个时长三小时的重点痛点识别研讨会,揭示了各组织普遍面临的难题:创建引人入胜的电子报是一项繁重的任务。内容创作者和企业领导者每周都要花费数小时浏览文章、撰写摘要和排版电子报——这些时间本可以用于更具战略意义的工作。通过我们系统化的框架应用,我们不仅发现了这一痛点,还发现了一个机会,可以展示人工智能代理如何改变这一复杂的多步骤流程。

The journey to create our newsletter agent system began with an intensive series of workshops designed to uncover and validate business opportunities for AI agents. We gathered a diverse team of content creators, business leaders, and technology experts for a structured exploration process that would span two weeks. Our first session, a focused three-hour pain point identification workshop, revealed a common frustration across organizations: the overwhelming task of creating engaging newsletters. Content creators and business leaders were spending hours each week scanning articles, writing summaries, and formatting newsletters—time they could have spent on more strategic work. Through our systematic framework application, we uncovered not just a pain point, but an opportunity to demonstrate how AI agents could transform a complex, multi-step process.

步骤二:价值链分析

Step 2: Value Chain Analysis

在初步调研会议之后,我们与主要利益相关者举办了一场为期半天的价值链映射研讨会。我们使用大型白板和流程图工具,细致地将新闻简报的制作流程分解成各个组成部分。经过四个小时的深入协作分析,我们最终确定了七个不同的任务:内容发现、摘要撰写、每日精选邮件发送、文章筛选、编辑、排版和最终审核。我们注意到这些任务之间的交接点——每个交接点都可能成为时间瓶颈,导致时间浪费和错误发生。尤其值得注意的是内容发现和摘要撰写之间的交接环节,我们发现内容创作者需要花费大量时间在不同的工具之间复制粘贴。

Following the initial discovery session, we conducted a half-day Value Chain Mapping workshop with key stakeholders. Using large whiteboards and process mapping tools, we meticulously broke down the newsletter creation process into its constituent parts. After four hours of intense collaborative analysis, the decomposition revealed seven distinct tasks: content discovery, summarization, daily curation emails, article selection, compilation, formatting, and final review. What caught our attention were the handoff points between these tasks—each transition represented a potential bottleneck where time was lost and errors could occur. Particularly interesting was the handoff between content discovery and summarization, where we found content creators spending considerable time copying and pasting between different tools.

步骤 3:市场痛点矩阵

Step 3: Market Pain Point Matrix

我们的市场痛点矩阵分析加深了我们对市场机遇的理解。通过与内容创作者的访谈,我们发现了三个主要的痛点:内容发现耗时、摘要质量难以保持一致以及新闻简报格式繁琐。在分析成本中心时,我们发现企业通常每周要投入15-20小时的熟练员工时间来制作新闻简报——对于一个本质上重复性的工作而言,这是一笔不小的投资。

Our Market Pain Point Matrix analysis deepened our understanding of the opportunity. Through interviews with content creators, we identified three major friction areas: the time-consuming nature of content discovery, the challenge of maintaining consistent quality in summaries, and the tedious process of formatting newsletters. When we examined the cost centers, we found that organizations were typically dedicating 15-20 hours per week of skilled employees’ time to newsletter creation—a significant investment for what was essentially a repetitive process.

第四步:人工智能能力匹配

Step 4: AI Capability Alignment

下一阶段包括一次至关重要的两小时能力评估研讨会,我们邀请了人工智能专家和流程负责人参加。人工智能能力匹配分析的结果尤其具有启发性。通过结构化的评估矩阵,我们梳理了不同的人工智能能力如何解决每个已识别的痛点。语言理解可以处理摘要任务,模式识别可以辅助内容相关性评估,推理链可以管理整体工作流程的协调。这种通过协作评分和讨论完成的系统性映射表明,我们拥有有效解决每个主要痛点的技术能力。

The next phase involved a critical two-hour Capability Assessment workshop where we brought together our AI experts and process owners. The AI Capability Alignment analysis proved particularly revealing. Through a structured evaluation matrix, we mapped out how different AI capabilities could address each identified pain point. Language understanding could handle the summarization tasks, pattern recognition could help with content relevance assessment, and reasoning chains could manage the overall workflow orchestration. This systematic mapping, completed through collaborative scoring and discussion, showed us that we had the technological capabilities to address each major pain point effectively.

第五步:整合机会

Step 5: Integration Opportunities

最后一周以一场内容全面的半天集成规划研讨会拉开帷幕。技术架构师、最终用户和系统管理员齐聚一堂,共同梳理集成点和潜在挑战。通过一系列结构化的练习和技术深入探讨,我们发现大多数组织已经在使用电子邮件进行内容共享,并使用 Google Docs 进行协作。这一洞察促使我们围绕这些常用工具设计代理系统,从而降低学习难度,提高用户采纳率。研讨会最终形成了一份详细的集成路线图和技术需求文档。

The final week kicked off with a comprehensive half-day Integration Planning workshop. Technical architects, end users, and system administrators came together to map out integration points and potential challenges. Through a series of structured exercises and technical deep dives, we discovered that most organizations were already using email for content sharing and Google Docs for collaboration. This insight led us to design our agent system around these familiar tools, reducing the learning curve and increasing the likelihood of adoption. The workshop concluded with a detailed integration roadmap and technical requirements document.

步骤六:机会优先级排序

Step 6: Opportunity Prioritization

整个流程最终以一场持续三小时的机会优先级排序会议告终,我们在会上汇总了之前的所有发现。我们采用系统化的评分方法,从五个关键维度评估了新闻通讯代理机会与其他潜在项目相比的优势:痛点严重程度、技术可行性、集成复杂性、潜在影响和资源。需求分析。电子报刊项目脱颖而出,成为最终赢家,在可行性和潜在影响力方面得分尤其高。

The process culminated in a final three-hour Opportunity Prioritization session where we brought together all previous findings. Using a systematic scoring approach, we evaluated the newsletter agent opportunity against other potential projects across five key dimensions: pain point severity, technical feasibility, integration complexity, potential impact, and resource requirements. The newsletter project emerged as the clear winner, scoring particularly high on feasibility and potential impact.

基于此分析,我们设计了一个多智能体系统,其中每个智能体都专注于特定任务,并与其他智能体协同工作。搜索智能体负责持续扫描预定义的资源,查找相关内容,并利用模式识别来评估文章的相关性。摘要智能体运用先进的语言理解技术,生成连贯且引人入胜的摘要。邮件智能体负责日常沟通的关键任务,确保人工审阅者收到组织良好的内容以供最终筛选。

Based on this analysis, we designed a multi-agent system where each agent specialized in a specific task while working in concert with others. The Search Agent was engineered to continuously scan predefined sources for relevant content, using pattern recognition to assess article relevance. The Summarization Agent employed advanced language understanding to create consistent, engaging summaries. The Email Agent handled the critical task of daily communication, ensuring that human reviewers received well-organized content for their final selection.

结果

The outcome

或许,我们框架最显著的验证来自系统的快速普及和积极的反馈。各机构报告称,该自动化系统将他们的简报制作时间缩短了 80%,同时保持甚至提高了质量。摘要的一致性有所提升,每日定期发送的电子邮件也帮助内容创作者及时掌握行业动态,而不会感到压力过大。

Perhaps the most significant validation of our framework came from the system’s rapid adoption and positive feedback. Organizations reported that the automated system reduced their newsletter creation time by 80%, while maintaining or improving quality. The consistency of the summaries improved, and the regular daily emails helped content creators stay on top of their industry news without feeling overwhelmed.

前路漫漫

The Road Ahead

如今智能体人工智能的机遇让我们想起了移动应用经济的早期阶段。正如第一波应用将日历和计算器等现有工具数字化一样,我们现在也看到了第一波智能体自动化现有的业务流程。然而,真正的变革将来自那些能够想象这项技术所带来的全新可能性的企业家。

The opportunities in agentic AI today remind us of the early days of the mobile app economy. Just as the first wave of apps digitized existing tools like calendars and calculators, we’re seeing the first wave of agents automating existing business processes. However, the real revolution will come from entrepreneurs who can imagine entirely new possibilities enabled by this technology.

智能体人工智能领域蕴藏着巨大的机遇,但要取得成功,需要系统地进行识别和验证。本框架提供了一种结构化的方法来发现和评估机遇,帮助创业者将精力集中在最有前景的领域。

The opportunities in agentic AI are vast, but success requires a systematic approach to identification and validation. This framework provides a structured way to discover and evaluate opportunities, helping entrepreneurs focus their efforts on the most promising areas.

成功的关键不仅在于了解代理商目前的能力,更在于预测他们未来的能力,同时构建能够立即创造价值的解决方案。未来属于那些能够弥合当前能力与未来可能性之间差距的人。

The key to success isn’t just understanding what agents can do today—it’s anticipating what they’ll be capable of tomorrow while building solutions that deliver value right now. The future belongs to those who can bridge this gap between current capabilities and future possibilities.

第四部分

PART 4

通过智能体人工智能实现企业转型

ENTERPRISE TRANSFORMATION THROUGH AGENTIC AI

 

 

W第三部分向您展示了如何构建高效的AI代理并作为企业家创造价值,而第四部分则探讨了一个更大的挑战:如何利用这项技术改造整个组织。对于那些希望大规模利用AI代理力量的企业来说,这才是真正考验其能力的地方。

While Part 3 showed you how to build effective AI agents and create value as an entrepreneur, Part 4 tackles an even bigger challenge: how to transform entire organizations with this technology. This is where the rubber meets the road for businesses seeking to harness the power of AI agents at scale.

在我们从事人工智能转型工作的整个职业生涯中,我们观察到一个显著的规律:仅仅拥有卓越的技术是远远不够的。即使是最先进的人工智能代理,如果人们不信任它,流程没有围绕它进行重新设计,或者治理结构没有为其提供支持,最终也会失败。简而言之,组织转型与技术实施同等重要,而且往往更具挑战性。

Throughout our careers implementing AI transformations, we’ve observed a striking pattern: technical excellence alone is never enough. The most sophisticated AI agent will fail if people don’t trust it, processes aren’t redesigned around it, or governance structures don’t support it. In short, organizational transformation is just as critical as technological implementation—and often far more challenging.

因此,第四部分不仅探讨人工智能代理的技术层面,还着重分析决定其规模化成功的因素,包括人、组织和战略层面。我们已经看到太多前景光明的人工智能项目在试点成功后停滞不前,无法克服组织层面的障碍,最终导致大规模应用受阻。我们不希望这种情况发生在您身上。

That’s why Part 4 goes beyond the technical aspects of AI agents to address the human, organizational, and strategic dimensions that determine success at scale. We’ve seen too many promising AI initiatives stall after successful pilots, unable to overcome the organizational barriers to widespread adoption. We don’t want that to happen to you.

从愿景到现实的道路很少是一帆风顺的,但只要方法得当,变革就触手可及。

The path from vision to reality is rarely straightforward, but with the right approach, transformational change is within reach.

第十章

CHAPTER 10

人机协作:领导力、信任与变革

HUMAN-AGENT COLLABORATION: LEADERSHIP, TRUST, AND CHANGE

大规模掌握工作设计和变革管理

Mastering Work Design and Change Management at Scale

L去年夏天,我们遇到了一件彻底改变我们对人工智能代理部署中变更管理理解的事情。当时我们正在与一家大型保险公司合作,为其部署第一批用于处理理赔的人工智能代理。技术实施进展顺利——或许过于顺利了。我们没有预料到的是,它会在整个组织内引发连锁反应。

Last summer, we encountered a situation that would fundamentally reshape our understanding of change management in AI agent deployments. We were working with a large insurance company, implementing their first wave of AI agents to handle claims processing. The technology implementation was proceeding smoothly—perhaps too smoothly. What we didn’t anticipate was the ripple effect it would create throughout the organization.

“我处理理赔已经十五年了,”资深理赔员弗洛拉说道,“我怎么能确定这位‘代理人’不会犯错,让我来收拾残局呢?”她的担忧反映了……部门内弥漫着越来越深的焦虑。尽管我们进行了周密的技术规划,却低估了转型过程中人的因素。事后看来,这其实并不令人意外,毕竟,部署人工智能代理涉及一个转型过程,会对员工的核心行为、价值观和认知产生重大影响。

“I’ve been processing claims for fifteen years,” shared Flora, a senior claims processor. “How can I be sure this ‘agent’ won’t make mistakes that I’ll have to fix?” Her concern reflected a deeper anxiety spreading through the department. Despite our careful technical planning, we had underestimated the human element of the transformation. In hindsight, this should not have surprised us, because, after all, deploying an AI agent involves a transformation process that has a significant impact on core employee behaviors, values, and perceptions.

在另一家公司,一家亚洲银行,智能自动化部门的负责人对人工客服充满热情。他说:“我们可以在三个月内开发出人工客服,取代四分之三的现有团队!”然而,他的一位同样从事业务流程改进但更注重变革管理的同事评论道:“我们的流程非常复杂,如果裁员,很多环节都会受到影响。”

At another company, an Asian bank, the head of intelligent automation was an enthusiast for agents. He said, “We can build agents in 3 months and replace ¾ of the team!” However, his colleague, who also worked in business process improvement but had a change management orientation, commented, “Our processes are so complex that if you eliminate people, you will break a lot of things.”

这些经验教会了我们一个至关重要的教训:人工智能代理部署的成功与否,不仅取决于技术,也取决于人。人们会负责部署代理系统、监控其性能、纠正错误、找出问题所在并尝试解决——或者,他们也可能拒绝承担所有这些重要的任务。

These experiences taught us a crucial lesson: the success of AI agent deployments depends as much on people as it does on technology. People will implement the agentic systems, monitor their performance, fix their mistakes, identify what went wrong, and try to fix them—or resist doing all of these important tasks.

成功实施智能体人工智能涉及诸多方面。首先,当然是推动和资助整个流程并做出关键决策的人类领导者。我们将在下一章详细阐述这些内容。其次是工作的详细设计:智能体将执行哪些任务,人类又将扮演什么角色?最后,还有变革管理——确保人类员工能够接受、理解并参与到向智能体同事过渡的过程中。

There are multiple human aspects to successfully implementing agentic AI. One, of course, is the human leadership that drives and funds the process and makes critical decisions throughout. We’ll describe those in the next chapter. Another is the detailed design of work: what will the agents do, and what role will humans play? Finally, there is change management—ensuring that human employees accept, understand, and can play a role in the transition to agents as colleagues.

为代理人和人类设计工作

Designing Work for Agents and Humans

让我们面对现实。对企业而言,人工智能代理的一大吸引力在于其对人工投入和干预的需求较低。生成式人工智能在2022年末兴起时,我们都被它深深吸引,但许多机构发现,在人工提示和编辑输出结果的情况下,生产力提升并未达到预期。

Let’s be realistic. One major appeal to companies in pursuing AI agents is a lower requirement for human labor and interventions. We were all dazzled by generative AI when it became popular in late 2022, but many organizations found that with human prompting and editing of outputs, there wasn’t the productivity improvement they hoped for.

借助智能体人工智能,至少在大多数情况下,人类干预输入和输出的需求将会降低。然而,正如我们之前讨论过的,随着智能体技术的成熟,对人类参与的需求会因具体应用场景和时间推移而变化。这意味着我们需要进行初始和持续的工作设计。对于任何给定的任务、工作流程或业务流程,都需要有人来决定智能体可以独立完成哪些工作,以及何时需要升级或由人类干预。这种干预可能仅在以下情况下发生:智能体判断自身无法完成请求的任务,或者其行为的后果(例如,经济价值或对客户的影响)足够严重,需要进行监控或审查。

With agentic AI, there will be—in most cases, at least—a lesser need for humans to intervene in inputs and outputs. As we’ve discussed, however, the need for human involvement will vary by the use case and over time as agentic technology matures. This all means that there is a need for initial and continuing work design. For any given task, workflow, or business process, someone needs to decide what agents can do on their own and when there is a need for escalation or intervention by humans. That intervention might take place only when the agent determines that it can’t do the requested task on its own, or when the consequences of its actions are sufficiently great (in terms of monetary value or the impact on a customer, for example) to require some monitoring or review.

这种工作设计传统上与业务流程改进或更大规模的重组相结合。这些流程方法在20世纪90年代和21世纪初非常流行,但在过去一二十年中有所衰落。随着人工智能的发展,它们开始复兴;企业意识到分析型和生成型人工智能可以实现新的流程设计,而流程和任务挖掘等相对较新的技术可以缩短设计和实施新流程的周期。此外,在机器人流程自动化(RPA)中,通常也会进行一定程度的工作设计,因为企业意识到,在实现自动化之前,不妨先改进流程。

This type of work design has traditionally been done with business process improvement or larger-scale reengineering. These process disciplines were popular in the 1990s and early 2000s, but have faded somewhat in the last decade or two. They have begun to return with AI; companies realize that analytical and generative AI can enable new process designs, and relatively new technologies like process and task mining can shorten the cycle time for designing and implementing new process flows. There was often also some degree of work design with robotic process automation, as companies realized they might as well improve the process before automating it.

拥有流程和工作设计经验的公司在应用智能体技术时将更具优势。他们懂得如何规划工作流程,如何部署人工智能功能,以及如何让员工参与到工作任务的设计和执行中。如果他们能够熟练运用“流程智能”工具,就能了解员工或人工智能代理是如何执行任务的,以及这些执行方式如何影响订单到收款或采购到付款等整体流程。

Companies that have a history of process and work design will have an advantage with agentic technology. They will understand how to lay out workflows, plan for enabling AI capabilities, and involve humans in the design and execution of work tasks. If they are comfortable with “process intelligence” tools, they’ll know how their tasks are being executed by either humans or AI agents, and how that impacts broad processes like order-to-cash or procure-to-pay.

当然,人们在日常工作任务和流程中扮演的角色类型可能会发生一些变化。许多结构化和重复性的工作将自动化程序可以完成许多工作,从而将人类从繁琐的任务中解放出来。然而,这些工作过去可能由入门级员工承担。未来或许需要减少这类员工的数量。人类对自动化程序完成的任务进行审核和改进,可能需要较高的技能水平,而这些技能只有经验丰富的员工才具备。劳动力市场的这种转变不仅会对个人职业生涯产生重大影响,还会对整个经济体和人口群体产生深远影响。

There will, of course, be some likely changes to the types of roles that humans are expected to play in day-to-day work tasks and processes. Much of the structured and repetitive work will be done by agents, relieving humans of boring tasks. However, that work may have been done by entry-level human employees in the past. There may be a need for fewer of those workers in particular. The review and remediation functions that humans are likely to perform on agent-completed tasks may require a relatively high level of skills that only experienced employees have. These shifts in the labor force may have important impacts not only on individual careers, but also on entire economies and demographic groups.

虽然很难提前预估不同类型员工需要掌握的所有技能和完成的任务,但应该尽可能提前通知并给予他们充足的准备时间,以便他们获得所需的能力。这种规划需要组织内技术部门和人力资源部门之间开展以往并不常见的合作。事实上,我们可能会看到一些前所未有的组合角色出现,例如“人力和数字资源管理”。

While it’s difficult to anticipate in advance all of the skills and tasks that different types of human employees will be asked to perform, they should be given as much notice and preparation time as possible to acquire the needed capabilities. This planning will require collaboration between technology functions and human resources groups in organizations that has not typically happened in the past. Indeed, we may see the need for combined roles like “human and digital resource management” that have never previously existed.

化恐惧为机遇:转变对人工智能代理的固有观念

Transforming Fear into Opportunity: Changing Mindsets About AI Agents

“他们是不是要用这些人工智能来取代我们?”一位全球制造公司的团队负责人提出的这个问题,道出了我们反复遇到的担忧。虽然在很多情况下这种担忧可能是错误的,但它绝非毫无道理。我们发现,要消除这种心态,仅仅安抚是不够的——它需要透明度、证据和切实成功的案例相结合。

“They’re bringing in these AI agents to replace us, aren’t they?” This question, posed by a team lead at a global manufacturing company, captures a fear we’ve encountered repeatedly. Although it may be incorrect in many situations, it is certainly not irrational. What we’ve learned is that addressing this mindset requires more than just reassurance—it needs a combination of transparency, evidence, and tangible examples of success.

在这家制造商,我们采取了三管齐下的策略,彻底改变了人们对人工智能代理的看法。首先,我们邀请了其他公司中成功将人工智能代理融入工作流程的团队。这些并非高管汇报,而是员工之间的直接对话,让他们能够坦诚地分享最初的担忧,以及工作环境如何因此而得到改善。

At this particular manufacturer, we took a three-pronged approach that transformed the narrative around AI agents. First, we brought in teams from other companies who had successfully integrated AI agents into their workflows. These weren’t executive presentations—they were peer-to-peer conversations where employees could honestly discuss their initial fears and how the context of their jobs had evolved for the better.

一位零售公司的客服代表分享了人工智能代理如何改变了她的工作,这尤其令人印象深刻:“我以前每天要花 70% 的时间处理重复性咨询。现在这些都由代理处理,我可以专注于真正能发挥作用的复杂客户问题。我的工作满意度确实提高了。”

A particularly powerful moment came when a customer service representative from a retail company shared how AI agents had transformed her role: “I used to spend 70% of my day on repetitive queries. Now the agents handle those, and I focus on complex customer issues where I can really make a difference. My job satisfaction has actually increased.”

其次,我们开展了所谓的“一日工作体验”研讨会。我们没有进行关于人工智能的抽象讨论,而是与团队合作,详细规划人工智能助手将如何改变他们的日常工作。结果表明,人工智能助手并非取代人工,而是主要消除工作中那些单调乏味的部分——而这些部分恰恰是大多数人都不喜欢的。

Second, we implemented what we call “Day in the Life” workshops. Instead of abstract discussions about AI, we worked with teams to map out exactly how their daily work would change with AI agents. This revealed that rather than replacement, the agents would primarily eliminate the mundane aspects of their jobs—the parts most people didn’t enjoy anyway.

第三,我们为每个团队制定了“未来角色发展路线图”。这些路线图并非空泛的技能提升承诺,而是详细的计划,展示了角色将如何演变以及将会出现哪些新的机遇。例如,我们展示了某些团队成员如何转型为“自动化专家”,将他们深厚的流程知识与新的技术技能相结合,从而管理和改进人工智能代理。

Third, we created “Future Role Roadmaps” for each team. These weren’t vague promises about reskilling but detailed plans showing how roles would evolve and what new opportunities would emerge. For example, we showed how certain team members could become “automation specialists,” combining their deep process knowledge with new technical skills to manage and improve the AI agents.

结果令人瞩目。在一个部门,最初有82%的员工对工作保障表示担忧,但在采用这种方法六个月后,76%的员工表示对人工智能代理的整合持积极态度。关键在于展示,而不仅仅是描述,人工智能代理如何增强而非取代人类的能力。

The results were striking. In one department where 82% of employees initially expressed concerns about job security, after six months of this approach, 76% reported feeling positive about the integration of AI agents. The key was showing, not just telling, how AI agents could enhance rather than replace human capabilities.

面向智能人工智能时代的技能演进

Evolving Skills for the Agentic AI Era

凭借我们在各类组织中部署人工智能代理的丰富经验,我们开发了一套名为“人工智能代理协作能力模型”(AICCM)的综合框架。该模型概括了员工和领导者在从使用基础人工智能工具(例如ChatGPT或Gemini等传统大型语言模型)过渡到与人工智能代理协作时必须经历的基本技能转变。

Through our extensive experience implementing AI agents across diverse organizations, we’ve developed a comprehensive framework we call the “AI Agent Collaboration Capability Model” (AICCM). This model captures the fundamental skill transitions that workers and leaders must navigate as they move from working with basic AI tools—like traditional large language models, such as ChatGPT or Gemini—to collaborating with AI agents.

基于对数十个实施案例的观察,该框架为组织提供了一份路线图,帮助其培养在智能人工智能时代取得成功所必需的人类能力。让我们来探讨员工和领导者必须掌握的AICCM的四个关键维度:

Based on patterns observed across dozens of implementations, this framework provides organizations with a roadmap for developing the human capabilities essential for success in the agentic AI era. Let’s explore the four key dimensions of the AICCM that workers and leaders must master:

从任务思维到工作流程思维

From Task to Workflow Thinking

早期的AI工具是逐个任务运行的,而智能体系统则是在工作流层面运行,协调相互关联的流程以实现更广泛的目标。这要求工作人员开发:

While earlier AI tools operated on a task-by-task basis, agentic systems work at the workflow level, orchestrating interconnected processes to achieve broader outcomes. This requires workers to develop:

流程图绘制技能:能够理解各个任务如何在更广泛的工作流程中相互关联,确保所有组件高效协同工作。

Process Mapping Skills: The ability to understand how individual tasks connect across broader workflows, ensuring all components work efficiently together

系统优化:设计和改进系统,使智能体能够在多个领域顺利运行。

System Optimization: Designing and refining systems that allow agents to operate smoothly across multiple domains

跨学科思维:理解各项任务和领域如何相互关联,从而创建综合解决方案。

Cross-Disciplinary Thinking: Understanding how tasks and fields interconnect to create comprehensive solutions

结果导向:较少关注个体任务的执行,更多关注定义指导代理人活动的预期结果

Outcome Orientation: Focusing less on individual task execution and more on defining desired outcomes that guide agent activities

一家制造公司的高级运营经理表示:“我掌握的最有价值的技能不是编码或快速工程,而是能够绘制端到端流程图,并确定代理商可以在我们运营的哪些方面发挥最大的作用。”

A senior operations manager at a manufacturing company shared, “The most valuable skill I’ve developed isn’t coding or prompt engineering—it’s being able to map out end-to-end processes and identify where agents can have the biggest impact across our operation.”

从控制到授权

From Control to Delegation

从直接控制到有效授权的转变,对许多员工来说或许是最具挑战性的。随着智能体的出现,员工必须培养以下能力:

The shift from direct control to effective delegation is perhaps the most challenging for many employees. With agentic AI, workers must develop:

监督能力:无需微观管理即可监控人工智能系统,确保其保持正常运行并维持效率。

Oversight Capabilities: The ability to monitor AI systems without micromanaging, ensuring they stay on track while maintaining efficiency

自主性平衡:培养一种关于何时施加控制、何时让智能体独立工作的“人工智能直觉”。

Autonomy Balancing: Developing an “AI intuition” about when to exert control versus when to let agents work independently

治理框架:创建用于跟踪、审计和改进人工智能决策的机制

Governance Frameworks: Creating structures to track, audit, and improve AI decisions

伦理风险管理:了解何时以及如何委派职责,同时确保人工智能在伦理界限内运行。

Ethical Risk Management: Understanding when and how to delegate responsibilities while ensuring AI operates within ethical boundaries

“学习如何将工作委托给人工智能比我想象的要难,”我们合作过的一位市场总监坦言,“我必须克服检查每一个操作的冲动,转而专注于评估结果并做出战略调整。”

“Learning to delegate to AI was harder than I expected,” admitted a marketing director we worked with. “I had to overcome the urge to check every single action and instead focus on reviewing the outcomes and making strategic adjustments.”

从简单的互动到真正的协作

From Simple Interactions to True Collaboration

与具有智能体的AI合作需要更复杂的人机协作方法:

Working with agentic AI requires a more sophisticated approach to human-machine collaboration:

能力感知:理解智能体在基本语言处理之外的扩展功能

Capability Awareness: Understanding the expanded functionalities of agents beyond basic language processing

情境式互动:提供高层次的指导,而非循序渐进的指示

Contextual Engagement: Providing high-level guidance rather than step-by-step instructions

协同设计:与人工智能合作,通过迭代改进共同创造解决方案

Collaborative Design: Working with AI to co-create solutions through iterative refinement

战略合作:利用人工智能实现自动化和精准化,同时融入人类的创造力和判断力。

Strategic Partnership: Leveraging AI for automation and precision while applying human creativity and judgment

在我们的金融服务实施过程中,我们发现,将人工智能代理视为“协作工具”的团队取得了明显更好的结果,并且报告了更高的工作满意度。

In our financial services implementation, we found that teams who viewed AI agents as “tools to collaborate with” achieved significantly better results and reported higher job satisfaction.

从增强到价值创造(长期能力)

From Augmentation to Value Creation (longer-term capabilities)

随着智能体接管整个工作流程,人类必须专注于以人工智能无法复制的方式创造价值:

As agents take over entire workflows, humans must focus on creating value in ways AI cannot replicate:

真正的创造力:开发具有情感深度和文化细微差别的新颖解决方案,这些是人工智能无法独立生成的。

Genuine Creativity: Developing novel solutions with emotional depth and cultural nuance that AI can’t independently generate

批判性评估:分析人工智能输出结果中的偏见、伦理考量和长期影响

Critical Evaluation: Analyzing AI outputs for biases, ethical considerations, and long-term impacts

人际关系建立:在人际关系、信任和同理心不可替代的领域表现出色

Relationship Building: Excelling in areas where human connection, trust, and empathy remain irreplaceable

整合专长:了解如何将人工智能能力与人类的独特优势相结合

Integration Expertise: Understanding how to complement AI capabilities with distinctly human strengths

一位参与我们智能体实施的医疗保健高管解释说:“我们发现的悖论是,拥抱自动化实际上使我们独特的人类技能更有价值。我们的员工现在有更多的时间与患者建立关系。”以及解决复杂问题——只有人类才能真正擅长的事情。”

“The paradox we’ve found,” explained a healthcare executive involved in our agent implementation, “is that embracing automation actually makes our distinctly human skills more valuable. Our staff now spend more time on patient relationships and complex problem-solving—the things only humans can truly excel at.”

通过培养我们人工智能协作能力模型中概述的能力,员工可以将对人工智能的潜在焦虑转化为对新机遇的兴奋。投资于在所有四个维度上培养这些技能的组织会发现,他们的团队不仅能更快地适应人工智能代理,还能发现利用这项技术进行创新的方法,而这些方法可能是技术专家从未设想过的。

By developing the capabilities outlined in our AI Collaboration Capability Model, employees can transform potential anxiety about AI into excitement about new opportunities. Organizations that invest in building these skills across all four dimensions find that their teams not only adapt more quickly to AI agents but also discover innovative ways to leverage the technology that technical experts might never have envisioned.

AICCM 已被证明是我们变革管理方法中一个宝贵的工具,它为技能发展计划提供了清晰的框架,并帮助组织识别具体的技能差距。我们发现,在模型所有四个维度上都表现卓越的团队,其代理成功实施的可能性是那些仅专注于技术集成的团队的三倍。

The AICCM has proven to be a valuable tool in our change management approach, providing a clear structure for skills development programs and helping organizations identify specific capability gaps. We’ve found that teams who excel in all four dimensions of the model are three times more likely to report successful agent implementations than those who focus solely on technical integration.

通过透明度建立信任

Building Trust Through Transparency

我们观察到的最成功的部署案例都有一个共同点:对员工岗位的影响保持高度透明。缺乏这种透明度,员工将不愿信任人工智能代理,并怀疑人工智能的引入是为了优化效率,使其超越人类能力,最终取代人类。换句话说,如果不解释引入人工智能代理背后的逻辑,员工就会对未来感到焦虑,并抵制你设想的变革。当我们意识到有必要提高之前提到的那家保险公司的透明度时,我们在实施过程中调整了策略。

The most successful deployments we’ve witnessed share one common element: radical transparency about the impact on roles. Without this transparency, employees will be reluctant to trust the AI agents and suspect that AI is adopted to optimize efficiency that will go beyond human abilities and thus end up replacing people. In other words, without explaining the rationale behind the decision to integrate AI agents, employees will be anxious about the future and resist the change that you have in mind. When we realized the necessity to enhance this kind of transparency at the insurance company mentioned earlier, we shifted our approach mid-implementation.

我们没有笼统地宣布人工智能代理将“优化理赔处理”,而是为每个角色创建了详细的影响图,展示了:

Instead of broadly announcing that AI agents would “optimize claims processing,” we created detailed impact maps for each role, showing:

哪些具体任务将由代理人处理?

Which specific tasks would be handled by agents

角色将如何演变以涵盖新的职责

How roles would evolve to include new responsibilities

在变化的环境中,哪些技能会变得更有价值?

What skills would become more valuable in the transformed environment

新架构下有清晰的职业晋升路径。

Clear career progression paths in the new setup

这种透明化策略产生了意想不到的效果。它非但没有加剧焦虑,反而减轻了焦虑,因为人工智能助手被视为提升个人利益和职业前景的途径。事实上,我们的新方法让员工能够清楚地看到自身角色将如何变化,从而使未来更加具体可控。

This transparency strategy had an unexpected effect. Rather than increasing anxiety, it actually reduced it because AI agents were perceived as ways that could enhance one’s own interests and career prospects. Indeed, our new approach allowed employees to see exactly how their roles would change, making the future feel more concrete and manageable.

教育的演变:超越传统培训

The Education Evolution: Beyond Traditional Training

我们早期犯的错误之一是过分依赖正式的培训课程,向员工讲解人工智能代理的潜力。一段时间后,我们意识到,如果想要成功推广人工智能代理,就必须确保人工智能代理的整合与人类的价值观和利益保持一致,同时还要保证代理能够独立运行。

One of our early mistakes was relying too heavily on formal training sessions where we told employees about the potential of an AI agent. After a while, we realized that if we wanted AI agent adoption to succeed, we had to ensure that the integration of an AI agent remained aligned with human values and interests while maintaining the ability of the agent to operate independently.

我们在一家电信公司吸取了惨痛的教训,尽管该公司开展了广泛的培训计划,但其客户服务人工智能代理的采用率仍然很低。

We learned this lesson the hard way at a telecommunications company where, despite conducting extensive training programs, adoption of their customer service AI agents remained low.

突破出现在我们转向“学习实验室”方法之后。我们不再进行纯粹的理论培训,告诉人们人工智能代理能做什么,而是让人们参与到集成过程中来。176

The breakthrough came when we shifted to a “learning laboratory” approach. Instead of purely theoretical training, where we told people what AI agents can do, we reasoned that we had people participate in the integration process itself.176

建立参与式采纳流程将使个人能够亲身体验人工智能代理可能为其工作带来的潜在优势。反过来,这将生成反馈,以便我们作为顾问以及整个组织可以利用这些反馈,确保人工智能代理以对每个人都有意义的方式集成,同时提高工作流程的效率。

Establishing a participatory adoption process would enable individuals to gain firsthand experience of the potential advantages an AI agent could bring to their jobs. This, in turn, would generate feedback that we, as consultants, along with the organization as a whole, could utilize to ensure that AI agents are integrated in ways that make sense to everyone while also enhancing efficiency in the workflow.

为了实现这一目标,我们创建了沙盒环境,让员工可以在日常工作的真实场景中使用人工智能代理进行实验。这种实践经验被证明具有变革性意义。

To achieve this, we created sandbox environments where employees could experiment with AI agents in real-world scenarios from their daily work. This hands-on experience proved transformative.

三支柱学习法

The Three-Pillar Learning Approach

经过反复试验,我们开发出了我们称之为“三支柱学习法”的方法:

Through trial and error, we’ve developed what we call the Three-Pillar Learning Approach:

1. 自主探索:员工将获得专门的时间,在特定的工作环境中体验人工智能代理,通过直接经验而非抽象概念进行学习。理想情况下,这种体验应在代理仍处于原型或概念验证阶段时进行,以便根据人类反馈进行改进。

1. Self-Directed Discovery: Employees are given protected time to experiment with AI agents in their specific work context, learning through direct experience rather than abstract concepts. This experimentation would ideally take place when agents are still prototypes or proofs of concept, and can still be modified based on human feedback.

2. 同伴学习网络:我们建立员工社区,让员工分享他们在使用人工智能代理时的经验、成功案例和失败教训。这些网络常常能发现我们之前未曾考虑过的创新用途,也是识别代理缺陷和改进方案的重要来源。

2. Peer Learning Networks: We establish communities where employees share their experiences, successes, and failures with AI agents. These networks often uncover innovative uses we hadn’t considered, and are also a great source for identifying shortcomings and possible fixes for agents.

3. 情境化培训:传统培训仍然很重要,但应该针对特定的业务职能、角色和用例进行定制,而不是通用的人工智能教育。

3. Contextual Training: Traditional training is still important but should be tailored to specific business functions, roles, and use cases rather than generic AI education.

通过所有权赋能

Empowerment Through Ownership

或许我们最重要的发现来自一家制造企业的经验。最初,该公司领导计划由IT部门管理所有人工智能代理的配置。然而,他们发现,如果员工拥有在既定管理框架内修改和配置自身代理的工具,那么各部门的采用率会显著提高,员工的满意度也会更高。与其将人工智能代理视为必须集成到工作流程中的成品,不如赋予员工控制权,这样可以增强他们的参与度和责任感,从而降低他们对人工智能代理的抵触情绪。

Perhaps our most significant insight came from a manufacturing company’s experience. Initially, the company’s leaders planned for the IT department to manage all AI agent configurations. However, they discovered that departments, where employees were given the tools to modify and configure their own agents (within governance frameworks), showed significantly higher adoption rates and reported greater satisfaction. Instead of presenting the AI agent as a finished product that must be integrated into the workflow, allowing employees a sense of control fosters commitment and responsibility, which reduces resistance to adopting the AI agent.

这一观察促成了我们所谓的“渐进式自主模型”的开发:随着员工对人工智能代理的熟练程度不断提高(沙盒环境有助于提升员工的熟练度),他们将获得越来越多的权限来定制和配置这些代理。这种方法形成了一个良性循环——员工会更加投入到他们参与塑造的代理的成功之中。

This observation led to the development of what we call the “Progressive Autonomy Model”: As employees demonstrate proficiency with AI agents (which a sandbox approach can help with), they gain increasing authority to customize and configure them. This approach creates a virtuous cycle—employees become more invested in the success of the agents they help shape.

民主化自动化的力量

The Power of Democratized Automation

我们发现,加速人工智能代理应用最有效的策略之一是通过低代码平台普及这项技术。这些工具能够将员工从被动的技术接受者转变为积极的变革参与者。在我们合作的一家大型医疗机构中,这种方法显著加快了变革步伐,并改变了员工的态度。

One of the most effective strategies we’ve discovered for accelerating AI agent adoption is democratizing the technology through low-code platforms. These tools transform employees from passive recipients of technology to active participants in the transformation. At a major healthcare provider we worked with, this approach led to a remarkable shift in both the pace of transformation and employee attitudes.

“起初,我把人工智能代理视为一种威胁,”身为医疗账单专家的马克坦言,“但当我真正能够自己配置自动化程序后,我意识到这是为了让我的工作变得更好,而不是取代我。” 马克和他的团队利用低代码工具,成功地实现了自动化。同事们能够自动处理日常账单查询,同时还能控制这些自动化流程的运行方式。

“Initially, I saw the AI agents as a threat,” admitted Mark, a medical billing specialist. “But once I could actually configure the automation myself, I realized this was about making my job better, not replacing me.” Using low-code tools, Mark and his colleagues were able to automate routine billing queries while maintaining control over how these automations worked.

成功的关键在于将以下四个要素结合起来:

The key to success lies in combining four elements:

易于使用的工具,使每个人,无论技术背景如何,都能参与创建和改进自动化流程。

Accessible tools that allow everyone, regardless of technical background, to participate in creating and improving automations

一个由“自动化倡导者”组成的网络,他们可以指导和支持其他人。

A network of “automation champions” who can guide and support others

代理和工具库,供业务人员在构建工作流程和业务流程自动化时使用。

Repositories of agents and tools that businesspeople can use in assembling workflows and business process automations

清晰的治理框架,既能促进创新,又能保障安全和合规性。

Clear governance frameworks that enable innovation while maintaining security and compliance

这种方法不仅能加速转型,还能建立真正的归属感,并培养人们对项目成功的责任感。事实上,当人们能够直接影响人工智能代理如何辅助他们的工作时,他们就会全身心投入到整个计划的成功之中。

This approach not only accelerates the transformation but also builds genuine ownership and instills a sense of responsibility to make the adoption project succeed. Indeed, when people can directly impact how AI agents assist their work, they become invested in the success of the entire initiative.

一些智能体AI供应商正在通过在其低代码软件版本中构建智能体开发和管理功能来推动这种方法。他们认为,大部分智能体开发工作将由“普通用户”而非专业的IT或AI开发人员完成。我们推荐这种民主化的方法,希望采用这种方法的组织应该寻找这些供应商和工具。

Some agentic AI vendors are facilitating this approach by building agent development and management capabilities into their low-code software versions. They believe that the majority of agent development will be performed by “citizens” rather than professional IT or AI developers. Organizations that wish to follow this democratized approach—which we recommend—should seek out these vendors and tools.

激励创新和冒险

Incentivizing Innovation and Risk-Taking

我们观察到(也曾犯过)的一个常见错误是,在使用人工智能代理奖励员工参与度时,只关注成功的结果。在一家零售公司,我们发现这种方法无意中抑制了……实验方面——员工们害怕尝试可能失败的新方法。当然,如果害怕尝试新事物和探索新的用户案例,就不会发生失败,但同时也无法学习。事实上,学习的唯一途径就是从失败中吸取经验,以便下次做得更好。

A common mistake we’ve observed (and made ourselves) is focusing solely on successful outcomes when rewarding employee engagement with AI agents. At a retail company, we learned that this approach inadvertently discouraged experimentation—employees were afraid to try new approaches that might fail. Of course, if one is afraid to try new things and explore new user cases, no failures will happen, but at the same time, no learning will take place either. Indeed, the only way to learn is to fail and use that experience to do better next time.

解决方案是调整激励机制,既奖励成功的实施,也奖励有据可查的“学习失败”。这营造了一种重视实验精神和成就的文化。正如我们的合著者之一大卫常说的:“要想快速学习,就要快速失败。”

The solution was to shift the incentive structure to reward both successful implementations and well-documented “learning failures.” This created a culture where experimentation became valued alongside achievement. As David, one of our co-authors, likes to say, “To learn fast, you need to fail fast.”

根据我们的经验,成功的激励计划通常包括:

Based on our experience, successful incentive programs typically include:

识别人工智能代理的新用例

Recognition for identifying new use cases for AI agents

分享经验教训,无论是成功还是失败,都将获得奖励。

Rewards for sharing lessons learned, whether from successes or failures

与人工智能代理专业知识相关的职业晋升机会

Career advancement opportunities tied to AI agent expertise

预留时间用于实验和学习

Protected time for experimentation and learning

除了激励措施之外,成功的组织还会通过设定明确的文化预期来使人工智能的采用成为常态:

Beyond incentives, successful organizations normalize AI adoption by setting clear cultural expectations:

人工智能并非懒惰的捷径。使用人工智能应被视为对人类工作的巧妙提升,而非逃避责任或敷衍了事的借口。

AI is not a shortcut for laziness. Using AI should be seen as a skillful enhancement of human work, not as an excuse to disengage or avoid responsibility.

奖励人工智能辅助的成果。员工不仅应因其人工付出而获得认可,更应因其最终成果的质量而获得认可——无论人工智能是否发挥了作用。如果人工智能有助于做出更明智的决策、撰写更完善的报告或更快地找到解决方案,那么使用它的人就应该因其有效利用技术而获得嘉奖。

Reward AI-assisted outcomes. Employees should be recognized not just for manual effort but for the quality of results—regardless of whether AI played a role. If AI helps generate a better decision, a stronger report, or a faster solution, the human using it should receive credit for leveraging technology effectively.

通过将激励机制与文化强化相结合,组织可以营造一种人工智能被广泛接受、专业技能得到奖励、创新蓬勃发展的环境。其目标不仅仅是采用人工智能,而是精通人工智能,即人类与人工智能协同工作,取得卓越成果。

By combining incentives with cultural reinforcement, organizations create an environment where AI is embraced, expertise is rewarded, and innovation thrives. The goal is not just AI adoption—it’s AI mastery, where humans and AI work together to achieve superior results.

构建可持续变革管理框架

Building a Sustainable Change Management Framework

基于我们的经验,我们开发了一套全面的框架,用于管理人工智能代理部署中的变更。关键在于认识到变更管理并非一次性事件,而是一个随着技术发展而不断演进的持续过程。

Through our experiences, we’ve developed a comprehensive framework for managing change in AI agent deployments. The key is to recognize that change management is not a one-time event, but an ongoing process that evolves alongside the technology.

组织及其领导者必须从一开始就设定明确的预期:采用人工智能代理需要时间,并且需要所有相关人员的积极参与。研究表明,人工智能代理刚引入时,效率往往会暂时下降,因为人们需要适应与新的数字同事合作。在此期间,员工不仅要学习如何与人工智能协作,还要培养优化人机交互的新技能。

Organizations and their leaders must set clear expectations from the start: adopting AI agents takes time and requires active participation from everyone involved. Research shows that when AI agents are first introduced, efficiency often declines temporarily as people adjust to working with their new digital coworkers. During this period, employees are not only learning how to collaborate with AI but also developing new skills to optimize human-machine interaction.

然而,一旦这些技能被掌握,新的规范和激励机制建立起来,生产力就开始提升。随着时间的推移,组织将达到人机协同的新高度,人工智能将增强工作流程,释放前所未有的效率和价值。177

However, once these skills are mastered and new norms and incentive structures are established, productivity begins to rise. Over time, organizations reach a new level of human-machine synergy, where AI enhances workflows, unlocking greater efficiency and value than ever before.177

这一转型遵循J型曲线效应——初期效率会下降,随后反弹并加速超越人工智能出现之前的水平。理解这条曲线至关重要:短期适应性挑战并非失败的标志,而是迈向长期转型的必要步骤。认识到这一点的组织……针对这种动态变化制定的计划将最有利于充分发挥人工智能的潜力。178

This transition follows a J-curve effect—initially, efficiency dips before rebounding and accelerating beyond pre-AI levels. Understanding this curve is critical: short-term adaptation challenges are not signs of failure but necessary steps toward long-term transformation. Organizations that recognize and plan for this dynamic will be best positioned to harness AI’s full potential.178

通过逐步自主建立信任

Building Trust Through Graduated Autonomy

在将人工智能代理打造为值得信赖的工作伙伴的过程中,我们了解到信任是一种双向互动。首先,正如我们之前提到的,组织需要让员工参与到人工智能代理的采纳和使用调整过程中。正如我们解释的,通过让员工积极参与人工智能代理在工作流程中的整合,他们将亲身体验到这些代理如何帮助他们实现目标,并以有趣且富有创意的方式重塑他们的工作。这种方法意味着组织需要“给予”员工信任,让他们将人工智能代理转化为协作伙伴。179

In the case of making AI agents trusted work partners, we learned that trust is a two-way approach. The first approach, which we had already noted earlier, concerns the necessity of the organization to involve employees in the process of adopting and modifying the use of an AI agent. As we explained, by turning employees into active players in the integration of AI agents in the workflow, they will experience for themselves how these agents can help their interests and reshape their jobs in interesting and creative ways. This approach entails that organizations “give” trust to their own employees to turn an AI agent into a collaborator.179

第二种方法是让AI代理自身说服员工相信它值得信赖。在这种情况下,信任需要(由AI代理)“赢得”。根据我们的经验,这种赢得信任的原则对于成功部署AI代理至关重要。事实上,我们在一家金融服务公司深刻体会到了这一点,该公司最初对AI代理的抵触情绪尤为强烈。最终,当我们实施了我们称之为“信任拨号”的方法后,取得了突破性进展。

The second approach is that the AI agent itself needs to convince the employee that it can be trusted. In this case, trust needs to be “earned” (by the AI agent). In our experience, this earning principle has proven crucial in successful AI agent deployments. In fact, we learned this lesson vividly at a financial services firm where initial resistance to AI agents was particularly strong. The breakthrough came when we implemented what we call the “trust dial” approach.

我们没有急于实现完全自动化,而是创建了一个循序渐进的系统,用户可以控制赋予人工智能代理的自主程度。用户可以从代理的“观察模式”开始,在这种模式下,他们可以观察代理的行动,而无需实际执行任何操作。随着用户信心的增强,他们可以逐步“提升”代理的自主性——首先允许其处理简单、低风险的任务,然后随着代理可靠性的提高,逐步扩大其权限。从某种意义上说,我们创建了另一个沙箱,员工可以在其中测试与人工智能代理协作时的反应。

Instead of pushing for immediate full automation, we created a graduated system where users could control how much autonomy they gave to their AI agents. They could start with the agents in “observation mode,” where they could see what the agent would do without actually executing actions. As users gained confidence, they could gradually “dial-up” the agent’s autonomy—first allowing it to handle simple, low-risk tasks, then progressively expanding its authority as it proved its reliability. In a way, we created another sandbox where employees could test the waters when collaborating with the AI agent.

“这就像培训新团队成员一样,”公司高级运营经理詹妮弗解释说,“你不会在第一天就给他们完全的自主权。你会先让他们从一些小任务中证明自己。”这种方法让持怀疑态度的人能够掌控采用新方法的节奏,从而将他们转变为支持者。

“It’s like training a new team member,” explained Jennifer, a senior operations manager at the firm. “You don’t give them full autonomy on day one. You let them prove themselves with smaller tasks first.” This approach transformed skeptics into advocates by giving them control over the pace of adoption.

这种建立信任方法的关键要素包括:

The key elements of this trust-building approach include:

从高可见性和低自主性开始

Starting with high visibility and low autonomy

为扩展代理能力创建明确的检查点

Creating clear checkpoints for expanding agent capabilities

保持代理行为的透明审计跟踪

Maintaining transparent audit trails of agent actions

建立易于使用的覆盖机制

Establishing easy-to-use override mechanisms

庆祝和分享团队内部的成功案例

Celebrating and sharing success stories across teams

展望未来:为高级代理功能做好准备

Looking Ahead: Preparing for Advanced Agent Capabilities

虽然目前的部署通常处于智能体人工智能发展框架的第 1-3 级,但组织必须为最终出现更复杂的智能体(第 4 级和第 5 级)做好准备。这种准备工作包括:

While current deployments typically operate at Levels 1-3 of the Agentic AI Progression Framework, organizations must prepare for the eventual emergence of more sophisticated agents at Levels 4 and 5. This preparation involves:

开发用于管理日益自主系统的框架

Developing frameworks for managing increasingly autonomous systems

建立能够随着能力发展而演进的治理结构

Creating governance structures that can evolve with advancing capabilities

培养能够与未来人工智能代理互补而非竞争的技能

Building skills that will complement rather than compete with future AI agents

为工作被代理人取代的员工制定计划和替代角色。

Developing plans and alternative roles for employees whose jobs are taken over by agents.

例如,总部位于瑞典的在线购物信贷公司Klarna曾大肆宣传其打造的“一人能处理700人工作”的客服代理系统。<sup>180</sup>然而,一些评论家,例如Mandel<sup> 181</sup>,发现该代理系统会犯多种错误,并且经常将客户转接给人工客服。Klarna的首席执行官透露,这700名代理系统只会通过自然减员的方式逐步淘汰;<sup> 182</sup>在此期间,他们将成为客户服务的第二线人员,或被重新分配到其他相关任务。Klarna主要将客户服务外包,并计划将员工人数从3000人减少到2000人。因此,人工客服并不会消失。

For instance, at Klarna, the online shopping credit company based in Sweden, the media focused extensively on the customer service agent it created that could “handle the jobs of 700” people.”180 However, some reviewers, such as Mandel,181 found that the agent made multiple types of mistakes and often referred customers to a human agent. Klarna’s CEO revealed that the 700 agents would be eliminated only through attrition;182 in the meantime, they would be the second line of customer response or be reassigned to other related tasks. Klarna primarily outsources its customer service, and it plans to reduce the number of workers in it from 3000 to 2000. Humans, therefore, are not going away.

人机协作的未来之路

The Path Forward in Human-Agent Collaboration

在人工智能代理部署中成功管理人为因素,需要在透明度、教育、赋能和激励之间取得微妙的平衡。其目标是将智能体人工智能的速度、效率和可靠性与人类独有的灵活性、批判性思维和全局观相结合。正如我们从成功和失败中汲取的经验教训,人工智能的推广应用需要与人们的兴趣和理解直接相关,这意味着人的因素与技术实现同等重要。

Successful management of the human dimension in AI agent deployments requires a delicate balance of transparency, education, empowerment, and incentivization. The goal is to combine the speed, efficiency, and reliability of agentic AI with the flexibility, critical thinking, and big-picture perspective that only humans can provide. As we’ve learned through our successes and failures, the adoption process needs to connect directly with people’s interests and understanding, which means the human element is just as crucial as the technical implementation.

还记得我们开篇故事里的弗洛拉吗?在我们第一次交谈六个月后,她已经成为部门里人工智能代理最坚定的支持者之一。“这不是要取代我们,”她现在告诉同事们,“而是要赋予我们更多。”“这些工具能帮助我们做更有意义的工作。”她从怀疑者到拥护者的转变,体现了有效的工作设计和变革管理在改变组织采用人工智能代理并从中受益方面的力量。

Remember Flora from our opening story? Six months after our initial conversation, she had become one of the strongest advocates for AI agents in her department. “It’s not about replacing us,” she now tells her colleagues. “It’s about giving us the tools to do more meaningful work.” Her journey from skeptic to champion exemplifies the power of effective work design and change management in transforming how organizations adopt and benefit from AI agents.

信息很明确:不要把工作设计和变革管理视为需要克服的障碍,而应将其视为创造一支更积极投入、技能更娴熟、适应性更强的员工队伍的机会,以便充分发挥人工智能代理的潜力,从而提升人类的绩效和兴趣。

The message is clear: Don’t approach work design and change management as a barrier to overcome but as an opportunity to create a more engaged, skilled, and adaptable workforce ready to harness the full potential of AI agents that will uplift human performance and interests.

人工智能时代的领导力:在混合团队中建立信任与协作

Leadership in the Age of AI Agents: Building Trust and Collaboration in Hybrid Teams

想象一下,某天早上走进办公室,发现团队里一半的成员都不是人类。这并非科幻小说——而是不久的将来,人工智能代理将与人类员工并肩工作。但是谁带领组织完成了这一转型?他们又是如何实现的?转型之后,如何领导这样一支混合型团队?如何建立人机之间的信任?这些问题并非纸上谈兵,随着各组织开始将人工智能代理融入工作流程,它们的重要性也日益凸显。

Imagine walking into your office one morning to find that half of your team members aren’t human. This isn’t science fiction—it’s the near future of work, where AI agents will collaborate alongside human employees. But who led the organization through that transition, and how did they go about it? After it’s transformed, how do you lead such a hybrid team? How do you build trust between humans and machines? These questions aren’t just theoretical; they’re becoming increasingly relevant as organizations begin integrating AI agents into their workflows.

新的领导模式:从控制到协作

The New Leadership Paradigm: From Control to Collaboration

传统的领导模式是为人类与人类合作的世界而构建的。旧的领导方法将人视为理性个体,并将其视为组织结构的一部分。最重要的是……通过控制层级结构和工作流程,命令控制型领导方式占据主导地位。183

Traditional leadership models were built for a world where humans worked with other humans. The old leadership approaches considered humans rational beings, which were part of the organizational structure. What mattered most was controlling the hierarchy and workflow, so the command-and-control approach was dominating as a leadership style.183

多年后,新的领导力理论指出了命令控制式领导方式的局限性,认为当人们愿意追随领导者——也就是认同领导者的理念——时,他们会更加投入和积极。因此,领导者更加注重目标明确、善于建立人际关系和赢得信任,从而使人们认同他们的信息,并愿意服从领导者的指令和要求。184

Years later, new leadership theories identified the limitations of the command-and-control approach by identifying that humans are more committed and motivated when they themselves are willing to follow a leader—when they buy into the story of the leader. As a result, leaders were focused more on being purposeful, able to build interpersonal relations, and establishing trust so people would buy into their message and, as such, willing to comply with the directives and requests of the leader.184

传统的领导力思维通常不认为领导力是多维的,可以融合多种风格。它往往被视为非此即彼的选择。

Traditional leadership thinking typically does not recognize leadership as multi-dimensional, where various styles can be integrated. It is often perceived as either one approach or another.

然而,根据我们为财富500强企业提供的咨询经验,人工智能代理的引入正在从根本上改变人们对领导力的看法。领导一支由人类和人工智能代理组成的混合团队,需要重新评估领导原则。这需要探索如何将不同的领导风格结合起来,以有效地促进协作并提升团队成功率。

However, based on our consulting experience with Fortune 500 companies, the introduction of AI agents is fundamentally altering this perspective on leadership. Leading a hybrid team comprising both humans and AI agents necessitates a re-evaluation of leadership principles. It requires an exploration of how different styles may be combined to effectively promote collaboration and enhance team success.

以我们最近合作过的一家全球制造公司的项目经理阿尼尔为例。她的团队既包括人类分析师,也包括负责数据处理和基本决策的3级人工智能代理。“起初,我试图用管理人类团队成员的方式来管理人工智能代理,”她告诉我们,“但我很快意识到这是错误的方法。人工智能代理不需要激励或情感支持——它们自己就能完成任务。”需要明确的目标和明确的参数。与此同时,我的团队成员也需要帮助,才能理解如何与人工智能同事有效协作。

Consider Anil, a project manager at a global manufacturing company we worked with recently. Her team included both human analysts and Level 3 AI agents handling data processing and basic decision-making. “At first, I tried to manage the AI agents the same way I managed my human team members,” she told us. “I quickly realized this was the wrong approach. The AI agents didn’t need motivation or emotional support—they needed clear objectives and well-defined parameters. Meanwhile, my human team members needed help understanding how to collaborate effectively with their AI counterparts.”

这个例子清楚地表明,当今的领导者必须找到合适的方法来发展人机之间更紧密的协作关系。在这些混合团队环境中,领导者需要努力营造一种氛围,使人们愿意与人工智能代理建立工作关系,这些代理可以进行双向对话,提出替代方案,甚至挑战我们的假设,从而带来新的方法和解决方案。为此,这些混合团队的领导者需要采取双管齐下的领导方式。

This example makes clear that leaders today will have to find the right approach to develop more collaborative relationships between humans and machines. In these hybrid team settings, leaders need to strive for a context where humans are willing to develop a working relationship with AI agents that can engage in back-and-forth dialogue, propose alternative ideas, and may even challenge our assumptions and, as such, lead to new approaches and solutions. To do so, leaders of these hybrid teams need a dual approach to leadership.

他们必须一方面通过理性且有控制力的方式满足人工智能代理的逻辑和客观需求,另一方面又要支持团队成员的情感和发展需求,从而激励他们服从和协作。这种双重关注构成了我们所说的“领导力二元性原则”。

They must simultaneously manage the logical, objective needs of AI agents by acting in controlling and rational ways while at the same time supporting the emotional and developmental needs of human team members so they will feel inspired to comply and collaborate. This dual focus creates what we call the “Leadership Duality Principle.”

这种转变伴随着组织结构的变革。传统的多层级管理结构正逐渐被扁平化、更具活力的组织所取代,人工智能代理将承担许多中层管理任务,例如日程安排、资源分配和绩效跟踪。其结果是形成一个更加有机的组织,人类可以专注于高层战略、创新和人际领导力。高层领导者需要分析并可能尝试调整公司的组织结构以及各层级管理人员的数量和类型。

This shift is accompanied by a transformation in organizational structure. Traditional hierarchical structures with multiple management layers are giving way to flatter, more dynamic organizations where AI agents handle many middle-management tasks like scheduling, resource allocation, and performance tracking. The result is a more organic organization where humans focus on high-level strategy, innovation, and interpersonal leadership. Senior leaders will need to analyze and perhaps experiment with their companies’ organizational structures and numbers and types of managers at each level.

另一个重大变化体现在决策方式上。如今的领导者通常主要依据数据分析和绩效指标做出决策,而未来的领导者则需要平衡人工智能生成的洞察与人类判断。例如,当人工智能代理提出基于数据的市场扩张建议时,领导者的角色不再仅仅是批准或拒绝该分析,而是要考虑该分析如何……这与公司的价值观、长期愿景和更广泛的社会影响相一致。事实上,领导者可以查看所有可用的数据,但最终,他们必须停止仅仅查看数据,而是利用数据生成的洞察和建议来做出决策,并创造对利益相关者有意义的价值。

Another significant change is in how decisions are made. While today’s leaders often base decisions primarily on data analysis and performance metrics, tomorrow’s leaders will need to balance AI-generated insights with human judgment. For instance, when an AI agent presents a data-driven recommendation for market expansion, the leader’s role isn’t to simply approve or reject the analysis, but to consider how it aligns with the company’s values, long-term vision, and broader societal impact. As a matter of fact, the reality remains that leaders can look at all the data available, but at one point, they must stop looking at the data and use the insights generated and recommendations offered to make decisions and create value that makes sense for their stakeholders.

在混合团队中建立信任

Building Trust in Hybrid Teams

信任是高效团队的基础,但当团队成员中有机器时,信任又该如何运作呢?关于人机协作中信任的研究和调查,例如 Duan 等人 (2024) 185和 Hoff 和 Bashir (2015) 186的研究,可以为理解这种动态提供宝贵的见解。通过我们的研究和实践经验,我们总结了混合团队中信任的三个关键维度:

Trust is the foundation of effective teams, but how does trust work when some team members are machines? Studies and surveys on trust in human-AI collaboration, such as those by Duan et al. (2024)185 and Hoff and Bashir (2015)186, can provide valuable insights into this dynamic. Through our research and implementation experience, we’ve identified three key dimensions of trust in hybrid teams:

1.人机信任:人类需要信任人工智能代理能够可靠且合乎伦理地完成任务。这种信任的建立依赖于透明度、持续的性能表现,以及组织及其领导层对人工智能能力和局限性的清晰沟通。187

1. Human-to-AI Trust: Humans need to trust that AI agents will perform their tasks reliably and ethically. This trust is built through transparency, consistent performance, and clear communication of AI capabilities and limitations by the organization and its leadership.187

2.AI与人类任务交接:在AI代理与人类之间进行任务交接时,信任至关重要。人类团队成员需要相信他们从AI代理那里收到的工作成果是准确、完整且公正的。任务需要简单明了,以便于检查。

2. AI-to-Human Handoff: Trust is crucial during task handoffs between AI agents and humans. The human team members need to trust that the work they receive from AI agents is accurate, complete, and unbiased. Tasks need to be simple and transparent so they can be checked.

3.人工智能环境下的人与人之间的信任:人类需要信任彼此的判断并拥有共同的价值观,才能与人工智能代理合作并监督它们。

3. Human-to-Human Trust in an AI Context: Humans need to trust each other’s judgment and share values to work with and oversee AI agents.

让我们来看看这在实践中是如何体现的。在最近与一家金融服务公司合作的项目中,我们部署了三级人工智能代理来处理客户的初步咨询和基本分析。起初,人工客服代表表现出了很大的抵触情绪,他们担心人工智能会取代他们,或者犯下需要他们来纠正的错误。

Let’s examine how this plays out in practice. In a recent project with a financial services firm, we implemented Level 3 AI agents to handle initial customer inquiries and basic analysis. The human customer service representatives initially showed significant resistance, fearing the AI would either replace them or make mistakes that they’d have to fix.

为了建立信任,我们实施了透明的协作框架:

To build trust, we implemented a transparent collaboration framework:

清晰的能力沟通:我们帮助团队准确了解 AI 代理能够做什么和不能做什么,使用代理 AI 发展框架来解释他们当前的 3 级能力。

Clear Capability Communication: We helped the team understand exactly what the AI agents could and couldn’t do, using the Agentic AI Progression Framework to explain their current Level 3 capabilities.

可见的成功指标:我们创建了仪表盘,显示人工智能和人类团队成员的准确性和效率,帮助每个人了解他们的互补优势,从而最大限度地减少协调和协作方面的问题。

Visible Success Metrics: We created dashboards showing the accuracy and efficiency of both AI and human team members, helping everyone understand their complementary strengths so that problems of coordination and collaboration are minimized.

渐进式集成:我们从简单的任务开始,随着团队越来越适应,逐步增加 AI 代理的职责。

Progressive Integration: We started with simple tasks and gradually increased the AI agents’ responsibilities as the team grew more comfortable.

三个月内,信任度显著提高,混合团队的绩效指标比之前的指标高出 40%。

Within three months, trust levels had significantly improved, and the hybrid team was outperforming previous metrics by 40%.

建立信任是一个循序渐进的过程。

Building Trust is a Progressive Exercise

我们观察到,在众多项目中,利用人工智能代理建立信任的过程遵循着一个引人入胜的模式。最近,在与一家全球制造企业合作的过程中,我们亲眼见证了这一过程的实时展开。该公司部署了一个人工智能代理来优化其供应链决策,而信任的演变经历了三个截然不同的阶段:

The journey of building trust with AI agents follows a fascinating pattern we’ve observed across numerous implementations. Recently, while working with a global manufacturing company, we witnessed this journey unfold in real time. The company had implemented an AI agent to optimize its supply chain decisions, and the evolution of trust followed three distinct phases:

第一阶段:验证阶段

Phase 1: The Verification Phase

起初,员工们对这个系统抱持着合理的怀疑态度,仔细核查它提出的每一项建议。这就像培训新员工一样——你需要验证他们的工作,直到你确信他们的能力为止。在这个阶段,生产团队花费了大量时间来验证系统提出的库存建议。

Initially, employees approached the agent with healthy skepticism, meticulously checking every suggestion it made. Think of it like training a new employee—you want to verify their work until you’re confident in their abilities. During this phase, the manufacturing team spent hours validating the agent’s inventory recommendations.

第二阶段:校准信任

Phase 2: Calibrated Trust

大约三个月后,发生了一件有趣的事情。团队开始构建我们称之为“校准信任”的机制——他们开始了解智能体的优势所在,以及哪些方面需要人工监督。例如,他们发现智能体在预测日常供应需求方面表现出色,但在遇到突发市场变化或紧急订单等特殊情况时,则需要人工干预。这种全局性思考一直是人类的职责,尤其是在世界不断变化而人工智能模型却停滞不前的情况下,这种思考显得尤为重要。

After about three months, something interesting happened. The team began developing what we call “calibrated trust”—they started understanding where the agent excelled and where it needed human oversight. They learned, for instance, that the agent was exceptional at predicting routine supply needs but needed human input for unusual situations like sudden market changes or emergency orders. This big-picture thinking has always been a human role, and it is particularly important when the world changes, but the AI models have not.

第三阶段:合作关系

Phase 3: Partnership

大约六个月后,最终阶段到来:真正的伙伴关系。此时,人工智能代理被认可为能够促进员工和组织的利益。团队与人工智能代理建立了高效的协作机制,在提升效率的同时,决策时间缩短了 60%。准确性。他们并没有盲目信任经纪人——相反,他们对如何有效合作有了细致入微的理解。

The final phase emerged after about six months: true partnership. At that time, the AI agents were accepted as promoting the interests of the employees and the organization. The team had developed such an efficient collaboration with their AI agent that they reduced decision-making time by 60% while improving accuracy. They weren’t blindly trusting the agent—instead, they had developed a nuanced understanding of how to work together effectively.

设定界限

Setting Boundaries

与人工智能代理进行有效协作需要维持适当的监督水平。Gleave 和 McLean 的研究强调,即使是先进的人工智能系统也可能出现故障和错误。<sup>188</sup>对于生成式人工智能模型而言尤其如此,这类模型通常是代理中使用的主要人工智能类型。因此,建立监控协议至关重要,尤其是在处理高风险任务时。

Effective collaboration with AI agents requires maintaining appropriate levels of oversight. Research by Gleave and McLean emphasizes that even advanced AI systems can be vulnerable to failures and errors.188 This is particularly true for generative AI models, which will often be the primary AI type used in agents. Therefore, establishing monitoring protocols is crucial, particularly for high-stakes tasks.

关键在于找到合适的平衡点——监控力度应足以发现重大错误,但又不能过于密集,以免抵消人工智能带来的效率提升。这种平衡点会根据任务背景和错误可能造成的后果而有所不同。

The key is finding the right balance—monitoring should be sufficient to catch significant errors but not so intensive that it negates the efficiency benefits of using AI. This balance will vary depending on the task context and the potential consequences of errors.

因此,建立清晰的边界和监督机制至关重要。通过我们在各行业的实践,我们开发了所谓的“决策控制框架”。该框架有助于组织逐步建立与人工智能代理之间的信任,同时保持适当的控制和监督。

This is why it is important to establish clear boundaries and oversight mechanisms. Through our implementations across industries, we’ve developed what we call the “decision control framework.” This framework helps organizations build trust gradually with AI agents while maintaining appropriate control and oversight.

1. 战略决策:对于需要复杂判断的前瞻性决策,人工智能代理应提供支持,但不应代为决策。人类的判断仍然至关重要。这类决策并不常见,因此难以建模或训练人工智能系统。例如,在零售业中,人工智能代理就需要进行战略决策。连锁店的库存管理系统、进入新市场或开展大型促销活动的决策仍然牢牢掌握在人手中,代理商提供数据分析和市场洞察来支持这些决策。

1. Strategic Decisions: For forward-looking decisions requiring complex judgment, AI agents should support but not make decisions. Human judgment remains essential here. These types of decisions don’t happen frequently, so they are difficult to model or train AI systems on. For example, when working with a retail chain’s inventory management system, decisions about entering new markets or launching major promotional campaigns remained firmly in human hands, with the agent providing data analysis and market insights to support these decisions.

2. 战术决策:对于复杂度适中的自适应决策,智能体可以提出行动建议,但未经人工批准不得执行。以零售业为例,人工智能智能体会根据预测的需求模式建议调整库存,并针对特定产品提出价格调整建议,但这些建议在实施前需要经过人工审核和批准。人工审核的频率可能部分取决于决策的经济价值。

2. Tactical Decisions: For adaptive decisions requiring moderate complexity, agents can suggest actions but shouldn’t execute them without human approval. In our retail example, the AI agent would recommend inventory adjustments based on predicted demand patterns and suggest pricing modifications for specific products, but humans would review and approve these recommendations before implementation. The frequency of human review may depend in part on the economic value of the decisions being made.

3. 运营决策:对于常规、重复性的决策,合适的代理可以在明确定义的参数范围内自主运行。例如,零售系统可以在库存达到预设水平时自动补货,但必须遵守特定的预算和数量限制。同样,在客户服务领域,代理可以根据内容分析自动将咨询路由到相应的部门,但会将复杂案例上报给人工主管。

3. Operational Decisions: For routine, repetitive decisions, appropriate agents can operate autonomously within clearly defined parameters. The retail system could automatically reorder standard items when inventory reached predetermined levels, but only within specific budget and quantity constraints. Similarly, in a customer service context, agents could automatically route inquiries to appropriate departments based on content analysis, but would escalate complex cases to human supervisors.

该框架提供了一种结构化的方法来逐步建立信任。随着团队对智能体在某一层面的表现越来越满意,他们可以在保持适当监督的前提下逐步扩大智能体的自主性。智能体还可以随着时间的推移接受新数据的训练,从而提高其性能和决策准确性。

This framework provides a structured approach to building trust over time. As teams become more comfortable with an agent’s performance at one level, they can gradually expand its autonomy while maintaining appropriate oversight. Agents can also be trained on new data over time, which should improve their performance and decision accuracy.

混合团队中的沟通策略

Communication Strategies in Hybrid Teams

混合团队的沟通需要一种新的方法。虽然目前的AI代理(1-3级)可以处理和回应自然语言,但它们缺乏人类所拥有的细致入微的理解力。这就造成了我们所说的“沟通鸿沟”。

Communication in hybrid teams requires a new approach. While current AI agents (Levels 1-3) can process and respond to natural language, they lack the nuanced understanding that humans possess. This creates what we call the “Communication Gap.”

为了弥合这一差距,我们开发了“混合团队沟通协议”:

To bridge this gap, we’ve developed the “Hybrid Team Communication Protocol”:

1.清晰的命令结构:与人工智能代理沟通时使用明确的语言,同时与人类团队成员保持自然的对话风格。

1. Clear Command Structures: Using unambiguous language when communicating with AI agents while maintaining natural conversation styles with human team members.

2.上下文管理:确保所有团队成员(包括人类和人工智能)都能获取与其任务相关的上下文信息。这对于能够理解和运用上下文信息的3级智能体尤为重要。

2. Context Management: Ensuring all team members, both human and AI, have access to relevant context for their tasks. This is particularly important for Level 3 agents that can understand and work with context.

3.反馈循环:在团队成员和人工智能代理之间建立定期反馈机制。这一反馈循环应由具备足够人工智能素养的领导者推动,他们能够将技术信息和数据与公司的人文价值观和品牌价值相结合。这样,他们就可以在必要时调整人工智能模型,从而提升绩效。

3. Feedback Loops: Establishing regular feedback mechanisms between human team members and AI agents. This feedback cycle should be facilitated by human leaders, who are AI-savvy enough to integrate tech information and data with the company’s human and brand values. This way, they can adjust, if needed, the AI models to enhance performance.

我们曾为一家科技公司提供咨询服务,该公司成功实施了这项协议。他们创建了一个标准化的沟通框架,人工智能代理会在输出结果前加上置信度以及所做的任何假设,这使得团队成员更容易评估和处理这些信息。

A technology company we advised implemented this protocol with remarkable success. They created a standardized communication framework where AI agents would preface their outputs with confidence levels and any assumptions made, making it easier for human team members to evaluate and work with the information.

面向未来的领导力技能

Future-Proofing Leadership Skills

随着人工智能代理在演进框架的各个层级中不断进化,领导者必须培养新的技能才能保持高效。基于我们的研究和经验,我们总结了领导者需要发展的关键能力:

As AI agents continue to evolve through the Progression Framework levels, leaders must develop new skills to stay effective. Based on our research and experience, we’ve identified key competencies that leaders need to develop:

1.人工智能素养:理解不同层级人工智能代理的能力、局限性和潜力。这并非要成为技术专家,而是要了解如何有效地利用人工智能。

1. AI Literacy: Understanding the capabilities, limitations, and potential of AI agents at different levels. This isn’t about becoming a technical expert but about understanding how to leverage AI effectively.

2.混合团队协调:能够协调人类和人工智能团队成员,确保最佳任务分配,选择合适的员工并结合代理角色,促进协作。

2. Hybrid Team Orchestration: The ability to coordinate between human and AI team members, ensuring optimal task allocation, selection of the right employees in combination with agentic roles, and facilitating collaboration.

3.伦理监管:随着人工智能代理承担越来越复杂的任务,领导者必须确保伦理问题得到妥善解决。这关乎为人工智能代理的正确设计和使用制定规范。

3. Ethical Oversight: As AI agents take on more complex tasks, leaders must ensure ethical considerations are properly addressed. It is about setting the norms for the right design and use of AI agents.

4.人工智能时代的变革管理:引导团队应对人工智能能力的持续演进及其在工作场所的融合。这包括培养团队的韧性、敏捷性和求知欲。

4. Change Management in the AI Era: The ability to guide teams through the continuous evolution of AI capabilities and workplace integration. This includes building skills of resilience, agility, and curiosity.

组织实施框架:构建跨职能团队

Organizational Implementation Framework: Building Cross-Functional Teams

未来的工作不仅仅是将人工智能添加到现有架构中,而是要重新构想团队的运作方式。我们正在见证跨职能、动态团队的兴起,在这些团队中,智能体系统可以同时在多个职能部门中运行。例如,一个智能体系统可能同时处理营销自动化、财务预测和客户服务,而人员则专注于战略和创意解决方案。

The future of work isn’t just about adding AI to existing structures—it’s about reimagining how teams operate. We’re seeing the emergence of cross-functional, dynamic teams where agentic systems operate across multiple functions simultaneously. For example, a single agentic system might handle marketing automation, financial forecasting, and customer service concurrently, while humans focus on strategy and creative solutions.

成功实施混合团队模式需要结构化的方法。借鉴诸如2024年混合工作模式研究等框架,可以为这一策略提供宝贵的支持。<sup> 189</sup>我们基于与众多组织合作的经验,开发了“混合团队整合模型”:

Successfully implementing these hybrid teams requires a structured approach. Drawing from frameworks like the 2024 study on hybrid work models can provide valuable support for this strategy.189 We’ve developed the “Hybrid Team Integration Model” based on our experience with numerous organizations:

第一阶段:为人工智能增强协作奠定基础

Phase 1: Laying the Groundwork for AI-Augmented Collaboration

要构建一个人工智能代理与人类无缝协作的混合团队模式,第一步是明确人工智能将如何增强人类的能力。首先要确定哪些工作流程可以由人工智能处理重复性、数据密集型任务,从而使人类能够专注于创造性和战略性工作。与团队开展研讨会,规划人工智能代理和人类之间的职责划分,确保职责清晰明确且目标一致。为团队配备合适的工具,例如人工智能驱动的任务管理平台或决策支持系统,并培训他们如何有效地使用这些技术。领导者必须具备指导人类和人工智能团队成员的能力,重点在于培养对技术的信任,同时保持人为监督。这一阶段的关键在于建立信任、构建结构并做好协作准备。

To set up a successful hybrid team model where AI agents and humans collaborate seamlessly, the first step is defining how AI will amplify human capabilities. Start by identifying workflows where AI can handle repetitive, data-heavy tasks, allowing humans to focus on creative and strategic work. Conduct workshops with teams to map out the division of responsibilities between AI agents and people, ensuring clarity and alignment. Equip teams with the right tools, such as AI-driven task management platforms or decision-support systems, and train them to interact with these technologies effectively. Leaders must be empowered with the skills to guide both human and AI team members, focusing on fostering trust in the technology while maintaining human oversight. This phase is all about creating a foundation of trust, structure, and readiness for collaboration.

第二阶段:测试和完善人机互动模式

Phase 2: Testing and Refining the Human-AI Dynamic

前期准备工作已经完成,下一阶段是试点混合团队,以测试人类和人工智能代理如何共同创造价值。真实场景。选择一个特定的团队或项目,让人工智能展现其潜力——例如,客户支持团队使用生成式人工智能来加快响应速度,或者营销团队利用人工智能优化营销活动。建立清晰的沟通机制:例如,人工智能起草提案,然后由人工进行完善和审批。定期反馈至关重要;每周进行回顾会议,团队成员分享人工智能工具的表现以及不足之处。此阶段的重点在于实验、调整流程,并确保人机协作流畅高效。

With the groundwork laid, the next phase is piloting hybrid teams to test how humans and AI agents can co-create value in real-world scenarios. Choose a specific team or project where AI can demonstrate its potential—such as customer support teams using generative AI for faster responses or marketing teams employing AI for campaign optimization. Establish clear communication protocols: for example, AI drafts proposals while humans refine and approve them. Regular feedback loops are critical here; conduct weekly retrospectives where team members share insights on how AI tools performed and where they fell short. This phase is about experimentation, tweaking processes, and ensuring the human-AI collaboration feels intuitive and productive.

第三阶段:在整个组织内扩展人机协同效应

Phase 3: Scaling Human-AI Synergy Across the Organization

在试点阶段完善协作模式后,下一步是将人工智能推广到各个团队和部门。为了增强人们对人工智能增强型团队协作的信心,应充分利用试点阶段的数据和用户评价,展示人工智能如何提升效率、决策水平和最终成果的真实案例。

After refining the collaboration model during the pilot, the next step is to scale AI adoption across teams and departments. To build confidence in AI-enhanced teamwork, leverage data and testimonials from the pilot phase, showcasing real examples of how AI has improved efficiency, decision-making, and outcomes.

随着人工智能代理的广泛应用,它们应该根据每个团队的独特需求进行定制。例如,销售团队可以受益于人工智能驱动的线索优先级排序,而研发团队则可以利用人工智能加速数据分析并加快创新周期。这种定制化确保了人工智能的集成具有实际意义和价值,而不是千篇一律的强制措施。

As AI agents roll out more broadly, they should be tailored to the unique needs of each team. For example, sales teams may benefit from AI-driven lead prioritization, while R&D teams can use AI to accelerate data analysis and drive faster innovation cycles. This customization ensures AI integration feels purposeful and valuable, rather than a one-size-fits-all mandate.

除了部署之外,构建人机协同文化至关重要。要认可并奖励那些有效结合人类创造力和机器精准性的团队——无论是通过公开表彰、经济激励还是绩效奖励。同时,要投资于持续的培训项目,使团队能够及时了解人工智能的功能和最佳实践,确保他们能够与技术同步发展。

Beyond deployment, building a culture of human-AI synergy is key. Recognize and reward teams that effectively combine human creativity with machine precision—whether through public recognition, financial incentives, or performance-based rewards. At the same time, invest in ongoing training programs to keep teams updated on AI capabilities and best practices, ensuring they continue to evolve alongside the technology.

最后,建立一个持续的反馈循环,让团队能够分享见解、挑战和改进措施。这一阶段不仅在于扩大人工智能的应用规模,更在于将人工智能融入到……组织文化——人类和人工智能代理作为真正的合作伙伴共同工作,创造出任何一方单独都无法实现的结果。

Finally, establish a continuous feedback loop where teams can share insights, challenges, and improvements. This phase is not just about scaling AI usage, but about embedding AI into the organizational culture—where humans and AI agents work as true partners, creating results that neither could achieve alone.

这一实施框架承认人工智能增强型工作场所中将出现不同的角色。某些角色——特别是那些涉及操作任务、数据分析和日常决策的角色——将主要由人工智能代理承担。与此同时,人类将专注于需要创造力、情商、领导力和道德监督的角色。这种划分并非为了取代,而是为了优化:使每种类型的团队成员,无论是人类还是人工智能,都能专注于发挥自身优势。

This implementation framework acknowledges the distinct roles that will emerge in the AI-augmented workplace. Certain roles—particularly those involving operational tasks, data analysis, and routine decision-making—will be primarily handled by AI agents. Meanwhile, humans will focus on roles that require creativity, emotional intelligence, leadership, and ethical oversight. This division isn’t about replacement but about optimization: Enabling each type of team member, whether human or AI, to focus on their strengths.

文化转型

Cultural Transformation

领导混合型团队最具挑战性的方面或许在于管理所需的文化转型。通过与各行各业的组织合作,我们发现,在人工智能时代,成功的文化转型需要:

Perhaps the most challenging aspect of leading hybrid teams is managing the cultural transformation required. Through our work with organizations across industries, we’ve observed that successful cultural transformation in the age of AI agents requires:

1.思维转变:从将人工智能代理视为工具转变为将其视为同事,同时保持对其当前能力和局限性的清晰认识。

1. Mindset Shift: Moving from seeing AI agents as tools to viewing them as co-workers, while maintaining a clear understanding of their current capabilities and limitations.

2.价值重塑:帮助团队成员理解他们的价值在于人类独有的能力,例如创造力、韧性、敏捷性、情商和复杂问题解决能力。

2. Value Realignment: Helping human team members understand that their value lies in uniquely human capabilities like creativity, resilience, agility, emotional intelligence, and complex problem-solving.

3.持续学习文化:营造一种人类和人工智能体都能持续学习和进步的环境。人类需要成长,而人工智能则需要从反馈中学习。

3. Continuous Learning Culture: Fostering an environment where both humans and AI agents are expected to learn and improve continuously. While Humans need to grow, AI needs to learn from feedback.

展望未来:混合型领导力的未来

Looking Ahead: The Future of Hybrid Leadership

随着人工智能代理在演进框架中不断进化,领导层也需要做出相应的调整。虽然目前的实现主要涉及1-3级代理,但我们必须为更高级功能的出现做好准备。

As AI agents continue to evolve through the Progression Framework, leadership will need to adapt further. While current implementations primarily involve Level 1-3 agents, we must prepare for the emergence of more advanced capabilities.

在这个不断变化的环境中,成功领导的关键在于保持我们所说的“适应性领导平衡”——随着人工智能能力的进步,能够调整和整合领导风格和方法,同时始终将人的需求和潜力放在等式的中心。

The key to successful leadership in this evolving landscape is maintaining what we call “Adaptive Leadership Balance”—the ability to adjust and integrate leadership styles and approaches as AI capabilities advance, while always keeping human needs and potential at the center of the equation.

领导者必须牢记,尽管人工智能代理能够处理日益复杂的任务,但领导力的本质仍然是人类的。激励他人、展现同理心、应对复杂的伦理抉择以及促进创新,这些能力仍将是人类独有的。

Leaders must remember that while AI agents can handle increasingly complex tasks, the essence of leadership remains fundamentally human. The ability to inspire, show empathy, navigate complex ethical decisions, and foster innovation will continue to be uniquely human capabilities.

事实上,人类领导者面临的最大挑战将是如何将人工智能战略性地融入人类体验,从而使我们有机会变得更加人性化——更有同理心、更有创造力,也更能体会生活的意义所在。在这种模式下,人工智能代理将成为我们人性的强大放大器,使人类能够探索自身潜能的深度,并重新定义人类成就的边界。

In fact, the biggest challenge for human leaders will be to strategically integrate AI into our human experience, so we have the opportunity to become not less but more human—more empathetic, more creative, and more attuned to what makes life meaningful. In this paradigm, AI agents will be powerful amplifiers of our humanity, enabling humans to explore the depths of their potential and redefine the boundaries of human achievement.

***

***

领导由人类和人工智能代理组成的混合团队,是自工业革命以来管理实践中最重大的变革之一。在这个新时代取得成功,需要技术理解、人文关怀和战略远见三者之间取得微妙的平衡。

Leading hybrid teams of humans and AI agents represents one of the most significant shifts in management practice since the Industrial Revolution. Success in this new era requires a delicate balance of technical understanding, human empathy, and strategic vision.

随着我们继续与实施人工智能代理的组织合作,我们也在不断学习和完善对这一新环境下有效领导力的理解。框架和我们在此概述的方法奠定了基础,但该领域正迅速发展。最成功的领导者将是那些能够根据自身具体情况调整这些原则,同时高度重视人才发展和技术融合的人。

As we continue to work with organizations implementing AI agents, we’re constantly learning and refining our understanding of effective leadership in this new context. The frameworks and approaches we’ve outlined here provide a foundation, but the field is rapidly evolving. The most successful leaders will be those who can adapt these principles to their specific contexts while maintaining a strong focus on both human development and technological integration.

未来的领导力并非在于人类与人工智能之间的抉择——它们本质上是两种不同的生物,因此没有必要做出选择——而在于如何创造二者之间的协同效应,从而将人类绩效和组织效率提升到前所未有的高度。通过理解和应用这些原则,领导者可以构建高效的混合型团队,充分发挥人类和人工智能的优势。

The future of leadership isn’t about choosing between human and artificial intelligence— there is no need to do so as they are two different types of animals190— it’s about creating synergies between them so human performance and organizations can be uplifted to unprecedented levels. By understanding and applying these principles, leaders can build highly effective hybrid teams that leverage the best of both human and artificial capabilities.

基金会:管理愿景与治理

The Foundation: Management Vision and Governance

在实施人工智能代理转型的过程中,我们发现了一个基本真理:成功始于并终于强大的管理愿景和积极参与。领导团队在人工智能代理如何支持其整体战略方面达成完全一致的组织,其转型成功的可能性是其他组织的两倍。但有效的管理参与在实践中究竟是什么样的呢?让我们通过真实案例和行之有效的方法来探讨这个问题。

Through our work implementing AI agent transformations, we’ve discovered a fundamental truth: success begins and ends with strong management vision and engagement. Organizations where leadership teams are fully aligned on how AI agents support their broader strategy are twice as likely to succeed in their transformation efforts. But what does effective management engagement look like in practice? Let’s explore this through the lens of real experiences and proven approaches.

通过经验树立愿景

Setting the Vision Through Experience

我们发现最常见的误区之一是管理团队试图仅通过战略文件和 PowerPoint 演示文稿来推动 AI 代理转型。这种抽象的方法往往会导致不切实际的期望和目标错位。目标。我们最近与一家全球电信公司合作,见证了一种强有力的替代方案。他们的首席执行官和高管团队没有像通常那样先进行演示,而是花了一整天的时间亲身体验人工智能代理。

One of the most common pitfalls we see is management teams attempting to drive AI agent transformations through strategy documents and PowerPoint presentations alone. This abstract approach often leads to unrealistic expectations and misaligned goals. We witnessed a powerful alternative approach at a global telecommunications company we worked with recently. Instead of starting with presentations, their CEO and executive team devoted an entire day to experiencing AI agents firsthand.

高管们与客服团队并肩工作,亲身体验这项技术。他们观察客服人员如何处理客户咨询、处理请求以及应对异常情况。这种沉浸式体验意义非凡。首席执行官后来表示,短短几个小时彻底改变了他对人工智能客服能力和局限性的看法。

The executives worked alongside customer service teams, experimenting with the technology themselves. They observed how agents handled customer inquiries, processed requests, and managed exceptions. This immersive experience proved transformative. The CEO later shared that those few hours completely changed his perspective on what AI agents could and couldn’t do.

实践经验帮助领导团队树立了基于实际情况而非炒作或恐惧的愿景,进而帮助他们向员工阐明组织为何以及如何采用人工智能代理。最终,领导团队能够更好地激励和说服员工,让他们相信人工智能代理出于与组织宗旨和目标相符的具体原因而具有积极意义。

The hands-on experience helped the leadership team develop a vision grounded in practical reality rather than hype or fear, which, in turn, helped them to communicate to their workforce why and how AI agents should be adopted by their organization. As a result, the leadership team was able to inspire and convince their employees more that AI agents would be a good thing for specific reasons aligned with the purpose and goals of the organization.

这种方法体现了我们在所有成功转型案例中观察到的一个关键原则:管理愿景必须植根于实际理解,才能有效激励和说服员工。当高管们花时间与人工智能代理实际互动时,即使只是几个小时,他们也能对这项技术的能力和局限性形成直观的认识。这种认识有助于制定更切合实际、更易于实现的转型目标,从而更容易地与员工沟通,并应用于员工的具体情况。

This approach reflects a crucial principle we’ve observed across successful transformations: management vision must be rooted in practical understanding if it is to be successful in motivating and persuading employees. When executives spend time actually working with AI agents, even for just a few hours, they develop an intuitive grasp of the technology’s capabilities and limitations. This understanding leads to more realistic and achievable transformation goals, which can be more easily communicated and applied to the specific situation of the workforce.

以身作则的力量

The Power of Leading by Example

在人工智能代理转型过程中,管理层的作用中最容易被低估的一点或许就是以身作则的重要性。我们一直观察到,当高管们在工作中积极使用并支持人工智能代理时,人工智能代理的普及率就会显著提高。整个组织范围内人工智能的应用显著增加。一个重要原因是,领导层会被视为更具公信力,因此更容易被接受为在组织内提出和引入人工智能代理的合法者。我们在一家咨询公司帮助其在运营中全面实施人工智能代理的过程中,就充分体现了这一现象。

Perhaps the most underappreciated aspect of management’s role in AI agent transformation is the importance of leading by example. We’ve consistently observed that when executives actively use and champion AI agents in their own work, adoption throughout the organization increases dramatically. An important reason is that the leadership will be seen as more credible and therefore accepted as legitimate in proposing and introducing AI agents in the organization. This phenomenon played out powerfully at a consulting firm where we helped implement AI agents across their operations.

这位管理合伙人有意识地决定将人工智能代理融入到日常工作中,并在客户演示和内部会议中公开使用。他会演示如何使用代理分析数据、生成洞察并起草初步建议。这种透明的做法实现了两个关键目标:它消除了组织内其他成员对这项技术的神秘感,并传递了一个清晰可信的信息:人工智能代理是组织及其成员应该拥抱而非畏惧的工具。

The Managing Partner made a conscious decision to integrate AI agents into his daily work, using them openly during client presentations and internal meetings. He would demonstrate how he used agents to analyze data, generate insights, and draft preliminary recommendations. This transparency accomplished two crucial things: it demystified the technology for others in the organization, and it sent a clear, credible message that AI agents were tools that the organization and its members should embrace instead of fear.

影响显著。短短六个月内,公司各层级员工自愿采用人工智能代理的比例就增长了300%。当我们就此快速普及现象采访员工时,许多人都提到管理合伙人的榜样是他们决定接受这项技术的关键因素。

The impact was remarkable. Within six months, the firm saw a 300% increase in voluntary adoption of AI agents across all levels of the organization. When we interviewed employees about this rapid adoption, many cited the Managing Partner’s example as a key factor in their decision to embrace the technology.

了解企业级转型的力量

Understanding the Power of Enterprise-Wide Transformation

人工智能代理的实施不仅仅是技术升级,更是组织内部工作方式的根本性变革。通过我们的研究和实践经验,我们发现,在多个职能领域应用人工智能代理的公司,其转型成功的可能性是那些仅在少数领域实施的公司2到3倍。191

The implementation of AI agents represents more than just a technology upgrade—it’s a fundamental transformation in how work gets done across an organization. Through our research and hands-on experience, we’ve found that companies who apply AI agents across multiple functional areas are 2 to 3 times more likely to succeed in their transformation efforts than those who limit implementation to isolated pockets.191

我们的经验始终表明,与仅限于孤立部门实施的组织相比,采取企业级人工智能代理策略的组织更有可能在转型过程中取得成功。这种显著差异源于我们在成功案例中观察到的几个关键因素。

Our experience consistently shows that organizations taking an enterprise-wide approach to AI agents are more likely to succeed in their transformation efforts than those limiting implementation to isolated departments. This stark difference stems from several key factors that we’ve observed across successful implementations.

大多数端到端业务流程都涉及多个职能部门;例如,从订单到收款的流程就涉及销售、财务、制造、供应链和物流,以及支持这些职能部门的多个信息系统。实施支持此类流程的代理程序可能既困难又耗时,但这却是提高生产力和客户满意度方面取得显著成效的唯一途径。

Most end-to-end business processes cut across functions; the order-to-cash process, for example, involves sales, finance, manufacturing, supply chain and logistics, and multiple information systems that support those functions. Implementing agents that support such processes may be difficult and time-consuming, but it is the only way to achieve major benefits in terms of productivity gains and customer satisfaction.

让我们来看看我们曾合作过的一家全球制造公司是如何运用人工智能技术的。最初,他们计划仅在财务部门部署人工智能代理来处理发票。

Let’s explore how this played out at a global manufacturing company we worked with. Initially, they planned to implement AI agents only within their finance department for invoice processing.

我们请他们解释他们认为这款应用如何与公司的愿景和目标相契合。根据他们的回答,我们进一步询问他们是否认为公司内部还有其他人工智能代理应用也能同样服务于公司的愿景和目标。在我们的引导和提问下,最终,该公司将应用策略扩展到了整个组织,从而提高了为所有利益相关者创造真正价值的可能性。这种指导其应用之旅的方法充分展现了从一开始就进行企业级思考的重要性。

We asked them to explain how they saw this application aligning with the vision and purpose of the company. Using their answer, we then asked whether they could see more AI agent applications across the organization that would equally serve the vision and purpose of the organization. Through our guidance and questioning, ultimately, the company expanded its adoption strategy across the entire organization, thereby increasing the likelihood of creating real value across the board and all stakeholders. This approach to guide their adoption journey illustrates the power of thinking enterprise-wide from the start.

全面范围的价值

The Value of Comprehensive Scope

公司首先着手梳理各部门之间的相互联系,并迅速意识到一个具有变革意义的洞见:传统的流程​​管理方式——即在各自独立的部门内进行流程管理——阻碍了公司的发展。例如,财务部门收到的发票不仅仅是一笔财务交易,它还会影响采购、供应商管理、客户服务等多个部门。物流部门需要采购部门核实交货情况,客户服务部门需要价格信息以解答客户咨询,而物流部门则需要发货确认信息。这些相互依存的关系表明,没有哪个部门是孤立运作的——然而,支持它们的系统在设计时却假定它们各自独立运作。

The company began by mapping interconnections between departments and quickly realized a transformative insight: the traditional approach of managing processes within isolated silos was holding them back. For example, an invoice arriving in finance was more than just a financial transaction. It rippled across procurement, vendor management, customer service, and logistics. Procurement needed to verify deliveries, customer service needed pricing information for customer inquiries, and logistics required shipping confirmations. These dependencies revealed that no department operated in isolation—yet, the systems supporting them were designed as if they did.

这些洞见确实奏效了。在我们的指导下,公司领导层逐渐认识到,只有建立起合作关系(包括人与人工智能之间以及人工智能自身之间的合作关系),人工智能才能创造真正的价值。他们很快意识到,必须促进跨部门协作,从而消除多年来根深蒂固的各自为政的工作模式。

And these insights were doing the trick. As the company leaders—under our guidance—understood that AI agents could only create real value when a cooperative partnership could be installed (between humans and AI and among AI agents themselves), they quickly realized that they needed to promote cross-functional collaborations and consequently eliminate the practice of working in siloes that had slipped into the organization’s mindset over the years.

因此,该公司并没有创建多个代理——一个负责财务,一个负责采购等等——而是采取了一项大胆的举措。他们意识到,为每个部门构建孤立的代理只会复制他们试图打破的壁垒。这种做法会限制人工智能的变革潜力,仅仅实现流程的自动化,而没有解决更大的挑战:在整个企业范围内创建无缝的端到端工作流程。

So, instead of creating multiple agents—one for finance, another for procurement, and so on—the company took a bold step. They recognized that building isolated agents for each department would only replicate the very silos they were trying to overcome. Such an approach would limit the transformative potential of AI, merely automating fragmented processes without addressing the bigger challenge: creating seamless, end-to-end workflows across the enterprise.

为了突破这些限制,他们设计了能够跨部门运行的横向端到端智能体系统。这些智能体不仅能够自动化单一职能部门内的具体任务,还能协调跨越多个领域的整个工作流程;例如,他们不再让财务人工智能智能体将信息传递给单独的采购人工智能智能体,而是创建了一个端到端的智能体来处理从发票接收到付款处理的所有环节。该智能体可以与采购部门核实交货情况,与物流部门确认发货,并标记价格不一致之处以供客户服务——所有这些都在一个统一的系统中完成。

To break free from these constraints, they designed transverse, end-to-end agentic systems capable of operating across departments. These agents didn’t just automate individual tasks within a single function; they orchestrated entire workflows that spanned multiple areas; for example, rather than having a finance AI agent pass information to a separate procurement AI agent, a single end-to-end agent was created to handle everything from invoice receipt to payment processing. This agent could validate deliveries with procurement, confirm shipping with logistics, and flag pricing inconsistencies for customer service—all within one cohesive system.

这种跨职能方法释放出了孤立式实现方式永远无法达到的能力。通过打破传统界限,人工智能代理开始涌现。这些洞察是任何一个部门都无法单独发现的。例如,研究发现付款条件与供应商绩效和现金流之间存在关联,这使得公司能够优化供应商选择、谈判更优合同并改善流动性。这些洞察不仅改进了运营,还提升了战略决策能力。

This cross-functional approach unleashed capabilities that a siloed implementation could never have achieved. By cutting across traditional boundaries, the AI agents began surfacing insights that no single department could have identified. For instance, patterns emerged linking payment terms to vendor performance and cash flow, enabling the company to optimize vendor selection, negotiate better contracts, and improve liquidity. These insights not only improved operations but also enhanced strategic decision-making.

结果令人瞩目:处理时间缩短了 40%,现金流管理效率提高了 25%。但真正的突破在于文化层面。通过打破部门壁垒,以创新方式应用人工智能,公司从一个分散的组织转型为一个统一的、数据驱动的企业。人工智能代理成为协作的催化剂,打破了壁垒,并培养了一种以集体解决问题为先的思维模式,而非固守部门利益。

The results were transformative: a 40% reduction in processing time and a 25% improvement in cash flow management. But the real breakthrough was cultural. By implementing AI in a way that transcended departmental divides, the company shifted from a fragmented organization to a unified, data-driven enterprise. The AI agents became catalysts for collaboration, breaking down barriers and fostering a mindset where solving problems collectively took precedence over protecting departmental turf.

这次经验凸显了一个至关重要的教训:智能体的强大之处不仅在于其自动化能力,更在于其整合能力。企业应抵制部署多个孤立智能体的诱惑,转而抓住机遇,设计能够打破信息孤岛的系统。真正的变革发生在人工智能成为维系企业运转的纽带之时,使其能够作为一个凝聚的、智能的整体协同运作。这种方法要求构建多智能体系统,其中不同的智能体扮演不同的角色,而这些角色必须由人类领导者进行全面协调,这些领导者根据组织的目标和愿景来引导这种协作动态。

This experience underscores a crucial lesson: the power of agentic AI lies not just in its ability to automate but in its capacity to integrate. Organizations should resist the temptation to deploy multiple isolated agents and instead seize the opportunity to design systems that cut across silos. True transformation happens when AI becomes the glue that binds the enterprise, enabling it to operate as a cohesive, intelligent whole. This approach asks for building multi-agent systems where different agents play different roles that have to be coordinated across the board by human leaders who guide this collaborative dynamic based on the goal and vision of the organization.

规模经济和投资优化

Economies of Scale and Investment Optimization

这种企业级方法还带来了显著的规模经济效益。公司可以与技术供应商谈判更优惠的条款,在各部门之间分摊开发成本,并构建集中式支持基础设施。更重要的是,在一个领域积累的经验可以迅速应用于其他领域,从而加速整体转型,并且还有一个额外的好处:可以总结出最佳实践案例,这些案例在未来的任何转型项目中都将非常有用。

The enterprise-wide approach also delivered significant economies of scale. The company could negotiate better terms with technology vendors, share development costs across departments, and build a centralized support infrastructure. More importantly, lessons learned in one area could be quickly applied to others, accelerating the overall transformation with the extra bonus that best cases were developed that would prove useful in any future transformation project.

他们对核心人工智能代理基础设施(包括安全框架、数据管理系统和集成平台)的投资可以分摊到多个用例中。这使得各个部门的实施更具成本效益,也更容易获得论证。

Their investment in core AI agent infrastructure—including security frameworks, data management systems, and integration platforms—could be amortized across multiple use cases. This made individual department implementations more cost-effective and easier to justify.

构建企业级业务案例

Building an Enterprise-Wide Business Case

他们成功的关键因素之一是制定了一份全面的企业级商业计划书。这超越了传统的部门级投资回报率计算,将跨职能效益和网络效应纳入考量。该商业计划书包括:

A crucial element of their success was developing a comprehensive enterprise-wide business case. This went beyond traditional department-level ROI calculations to consider cross-functional benefits and network effects. The business case included:

各部门直接节省成本并提高效率

Direct cost savings and efficiency gains in each department

改进数据共享和流程集成带来的跨职能效益

Cross-functional benefits from improved data sharing and process integration

通过规模经济降低技术成本

Reduced technology costs through economies of scale

通过端到端流程优化提升客户体验

Improved customer experience from end-to-end process optimization

消除所有职能部门的重复性工作,从而提高员工满意度

Enhanced employee satisfaction from the elimination of repetitive tasks across all functions

这种企业层面的视角展现了转型的全部潜在价值,从而有助于获得更广泛的高层支持。它还为确定实施优先级和跨部门资源分配提供了框架。

This enterprise-wide view helped secure even broader executive support by demonstrating the full potential value of the transformation. It also provided a framework for prioritizing implementations and allocating resources across departments.

从企业视角看待代理可能也需要新的组织结构和领导角色。大多数公司没有跨职能流程负责人,也没有能够对端到端流程做出决策的人员。理想情况下,实施新的代理架构需要创建流程负责人角色,但这不太可能。然而,大多数组织都需要委员会或代表来讨论跨职能代理的问题。领导者应该鼓励他们的负责人。各职能部门和单位在企业代理项目上进行协作,并有时会亲自参与其中。

Taking the enterprise view of agents may also require new organizational structures and leadership roles. Most companies don’t have cross-functional process owners or anyone who can make decisions for the end-to-end process. Ideally, implementing new agentic architectures would involve the creation of process owner roles, but this is unlikely. Most organizations will, however, need councils or representatives to deliberate over cross-functional agents. Leaders should encourage their heads of functions and units to collaborate on enterprise agent projects and get involved themselves at times.

构建全面的商业案例

Building a Comprehensive Business Case

人工智能代理的商业价值需要超越传统的投资回报率计算。领导者需要鼓励基于代理项目的利益相关者采取比以往更广阔的视角。我们曾与一家大型保险公司合作,他们最初的商业案例仅关注自动化带来的成本节约。这种狭隘的视角导致了中层管理人员的抵制,他们认为该项目纯粹是为了削减成本。我们帮助他们重新构建了商业案例,纳入了我们认为对人工智能代理转型至关重要的四个关键要素:

The business case for AI agents needs to go beyond traditional return on investment calculations. Leaders need to encourage stakeholders of agent-based initiatives to take a broader perspective than they might otherwise adopt. When we worked with a major insurance company, their initial business case focused solely on cost savings from automation. This narrow view led to resistance from middle management, which saw the initiative as purely cost-cutting. We helped them rebuild their business case to include four key components that we’ve found essential for AI agent transformations:

首先,量化收益:该项目不仅能节省成本,还能带来收入增长机会。例如,他们的AI代理可以处理多40%的客户咨询,从而使保单销售额增长15%。他们还衡量了处理速度、错误率和合规准确性方面的提升。

First, quantitative benefits: This project involves not just cost savings but also revenue enhancement opportunities. For example, their AI agents could handle 40% more customer inquiries, leading to a 15% increase in policy sales. They also measured improvements in processing speed, error reduction, and compliance accuracy.

其次,是定性效益:这类效益包括员工满意度提升(因为代理人处理日常事务)、客户体验改善(通过全天候服务和始终如一的服务)以及运营韧性增强。这家保险公司发现,当人工智能代理人接管日常理赔处理工作后,员工满意度提高了30%以上。

Second, qualitative benefits: This category of benefits encompasses improved employee satisfaction (as agents handle routine tasks), better customer experience (through 24/7 availability and consistent service), and enhanced operational resilience. The insurance company found that employee satisfaction scores increased by over 30% when AI agents took over routine claims processing.

第三,实施成本:生产实施和部署的成本不仅应包括技术成本,还应包括变更管理、培训以及过渡期内可能出现的生产力下降。这家保险公司将40%的预算分配给了变更管理和培训,事实证明,这对于成功采用新系统至关重要。

Third, implementation costs: The costs of production implementation and deployment should include not just technology costs but also change management, training, and potential productivity dips during the transition period. The insurance company allocated 40% of its budget to change management and training, which proved crucial for successful adoption.

第四,风险评估与缓解:项目风险包括技术风险(例如系统集成难题)和组织风险(例如员工抵触或技能差距)。他们识别出数据安全问题等关键风险,并针对每项风险制定了具体的缓解策略。

Fourth, risk assessment and mitigation: Risks from the project include both technical risks (like system integration challenges) and organizational risks (such as employee resistance or skill gaps). They identified critical risks like data security concerns and developed specific mitigation strategies for each.

人工智能代理实施的治理:一种整体性和伦理性的方法

Governance for AI Agent Implementations: A Holistic and Ethical Approach

有效的AI代理实施治理必须平衡创新需求与管控要求,同时将伦理原则融入决策的每一个环节。这需要一个结构化且灵活的框架,以确保组织内部的协调一致,促进创新,并维护公众信任。与其他形式的AI一样,伦理治理并非仅仅关乎政策,而是要从用例构思之初就对其进行评估。192

Effective governance for AI agent implementations must balance the need for innovation with the imperative for control while embedding ethical principles into every layer of decision-making. This requires a structured yet flexible framework that ensures alignment across the organization, fosters innovation, and maintains public trust. As with other forms of AI, ethical governance is not a matter of policies alone but rather the evaluation of use cases from the time they are conceived.192

一家全球性银行在代理创新举措的整体治理和伦理方面树立了杰出的典范。该银行实施了一套以“代理创新委员会”为核心的综合治理模式。该委员会将业务部门负责人、IT主管和员工代表联合起来,共同构建了一个共享的治理框架。其关键优势在于将战略监督与运营灵活性相结合,从而在不牺牲控制的前提下,促进创新蓬勃发展。

A standout example of overall governance and ethics for agentic initiatives comes from a global bank that implemented a comprehensive governance model centered on an “Agent Innovation Council.” This council unified business unit leaders, IT executives, and employee representatives to create a shared governance framework. Its key strength lies in integrating strategic oversight with operational flexibility, allowing innovation to flourish without sacrificing control.

委员会采用三级结构运作:

The council operated through a three-tiered structure:

公司最高层设有一个指导委员会,负责制定战略方向,确保所有人工智能项目都与公司整体目标保持一致。该委员会还负责资源分配,确保高优先级项目获得充足的支持。

At the top, a steering committee set the strategic direction, ensuring all AI initiatives aligned with broader company objectives. This committee also managed resource allocation, ensuring that high-priority projects received adequate support.

第二层级,即卓越中心(CoE),作为运营中心,提供技术专长,规范实施实践,并确保遵守治理协议。

The second tier, a Center of Excellence (CoE), acted as the operational hub, providing technical expertise, standardizing implementation practices, and ensuring compliance with governance protocols.

在基础阶段,各个部门都指定了“人工智能倡导者”,负责识别应用案例、倡导采用人工智能,并作为业务部门与卓越中心之间的联络人。由于人工智能倡导者积极参与各个团队的工作,关于如何有效利用人工智能的讨论自然而然地融入到日常运营中,从而培育了一种自下而上的实验和创新文化。治理不再被视为繁文缛节,而是成为一种支持机制,鼓励负责任的人工智能实验。

At the foundational, individual departments designated “AI Champions” to identify use cases, advocate for adoption, and act as liaisons between business units and the CoE. With AI champions actively present across teams, conversations about how to use AI effectively became a natural part of daily operations, fostering a bottom-up culture of experimentation and innovation. Instead of governance being perceived as bureaucratic red tape, it became a supportive structure that encouraged responsible AI experimentation.

至关重要的是,该治理框架还强调将伦理监督作为其结构不可或缺的一部分。这家全球性银行设立了“人工智能伦理委员会”,负责审查所有主要的人工智能应用。该委员会负责维护数据隐私、决策透明度、公平性和问责制等核心伦理原则。

Crucially, the governance framework also emphasized ethical oversight as an integral part of the structure. The same global bank established an “AI Ethics Board” to review all major AI implementations. This board was responsible for upholding core ethical principles such as data privacy, decision-making transparency, fairness, and accountability.

通过将伦理道德融入治理结构,该组织避免了将这些考量置于次要地位。相反,伦理审查成为人工智能开发生命周期早期阶段的强制性检查点,从而可以对应用场景进行修订,增强公众信任,并降低潜在的监管风险。

By embedding ethics into the governance structure, the organization avoided relegating these considerations to a secondary role. Instead, ethical review became a mandatory checkpoint early in the AI development lifecycle, allowing for the possibility of revising the use case, reinforcing public trust, and mitigating potential regulatory risks.

例如,伦理委员会制定了相关准则,确保人工智能代理在数据使用方面遵循明确的界限。这些准则禁止诸如在自动化招聘流程中进行带有偏见的决策或使用不透明的定价算法等做法。

For example, the ethics board established guidelines that ensured AI agents operated within clear boundaries regarding data usage. These guidelines prohibited practices such as biased decision-making in automated hiring processes or opaque pricing algorithms.

该银行超越了内部监管,建立了与客户的直接反馈机制。实时反馈系统客户一旦发现人工智能驱动流程中存在不一致、不公平决策或潜在偏见,即可立即提出质疑。无论是自动贷款审批、欺诈检测还是客户服务响应,用户都可以报告差异,从而启动内部审查。通过积极解决伦理问题,该银行不仅维护了自身声誉,还增强了员工和客户对其人工智能举措的信心。

The bank went beyond internal oversight and created a direct feedback loop with customers. The real-time feedback system allowed customers to flag concerns immediately when they noticed inconsistencies, unfair decisions, or potential bias in AI-driven processes. Whether it was an automated loan approval, fraud detection, or customer service response, users could report discrepancies, prompting an internal review. By proactively addressing ethical concerns, the bank not only safeguarded its reputation but also enhanced employee and customer confidence in its AI initiatives.

代理商创新委员会的月度会议成为这一治理模式的核心。这些会议提供了一个平台,用于评估进展、解决问题并确保与公司不断发展的战略保持一致。通过促进跨部门协作并保持开放的反馈渠道,该委员会营造了一种创新与责任并存的文化。

The monthly meetings of the Agent Innovation Council became the linchpin of this governance model. These sessions provided a forum to evaluate progress, address concerns, and ensure alignment with the company’s evolving strategy. By fostering collaboration across departments and maintaining an open channel for feedback, the council created a culture where innovation thrived alongside accountability.

这种整合治理模式表明,有效的AI转型不仅仅是一项技术工作,更是对整个组织工作方式的重新构想。通过协调战略监督、运营支持和道德责任,该治理框架不仅确保了AI代理的顺利实施,还巩固了它们作为创造长期组织价值力量的作用。

This integrated governance model demonstrates that effective AI transformation is not just a technical endeavor—it is a reimagining of “how” work is done across the organization. By aligning strategic oversight, operational support, and ethical accountability, the governance framework not only ensured the smooth implementation of AI agents but also solidified their role as a force for long-term organizational value.

人们普遍担心,管理繁琐的人工智能项目会引入过多的官僚主义,从而阻碍创新。然而,在这个案例中,情况恰恰相反。各部门设立的人工智能倡导者营造了一种文化氛围,在这种氛围下,人工智能的应用得到了持续的讨论、完善和优化。人工智能不再是一个遥远的、由IT部门控制的系统,员工们积极参与其中,共同探讨如何利用人工智能来改善他们的日常工作。

A common concern about governance-heavy AI initiatives is that they introduce excessive bureaucracy, slowing down innovation. However, in this case, the opposite happened. The presence of AI Champions across departments created a culture where AI use was constantly discussed, refined, and optimized. Instead of AI being a distant, IT-controlled system, employees were actively engaged in shaping how AI could improve their daily work.

这种文化转变催生了自下而上的创新。例如,银行风险管理部门的一个团队受到内部人工智能讨论的启发,提出了一个基于人工智能的欺诈预警系统。由于当时的治理结构允许快速评估和测试,这个想法迅速从……(此处原文缺失,无法翻译)从概念到实施的速度比传统创新流程更快。

This cultural shift led to grassroots innovation. For instance, one team in the bank’s risk management division, inspired by internal AI discussions, proposed an AI-powered early-warning system for fraud detection. Because the governance structure allowed for rapid evaluation and testing, this idea moved from concept to implementation faster than traditional innovation pipelines.

这种方法再次强调了组织领导层从采纳过程伊始就参与的重要性,因为他们的早期介入能够立即凸显公司目标、价值观和宗旨对于组织参与的任何变革和转型项目的重要性。因此,伦理和治理将自然而然地成为领导者和员工审视和评估人工智能代理成功整合的视角。

This approach underscores again the importance for organizational leadership to be participating from the start of the adoption process as their early presence introduces an immediate focus on how important the company’s goals, values, and purpose are for any change and transformation project the organization engages in. As such, ethics and governance will, in an organic manner, become the glasses that leaders and employees will use to look at and evaluate the integration of AI agents in successful ways.

毫无疑问,着手进行人工智能转型的组织必须采取这种整体性方法,因为它将把治理定位并认可为创新的推动者和防止道德失误的保障。当治理框架的设计将伦理深度融入其结构,并在项目实施之初就展现出可见性和指导性时,它们不仅能够引导技术发展,还能提升技术效能,从而改善人类工作条件,并促进所有利益相关者的利益。

It leaves little doubt that organizations embarking on AI transformations must adopt this holistic approach, as it will position and recognize governance as both an enabler of innovation and a safeguard against ethical missteps. When governance frameworks are designed to integrate ethics deeply into their structure and are visible and directing from the very beginning of the adoption project, they will not only guide the technology but also shape its effectiveness to enhance human work conditions and contribute to the benefits and interests of all stakeholders.

关键绩效指标和监控

Key Performance Indicators and Monitoring

关键绩效指标 (KPI) 和监控对于评估人工智能代理转型是否成功至关重要。大多数管理者都熟悉 KPI,尤其是在评估员工绩效方面。然而,当涉及到评估采用人工智能代理对公司创造的价值时,许多公司却未能成功。

Key Performance Indicators (KPIs) and monitoring are essential for assessing the success of an AI agent transformation. KPI’s are familiar to most managers when it involves assessing whether employees perform successfully. However, when it comes down to assessing the success of adopting AI agents with respect to the value the company generates, many companies fail.

事实上,根据我们的经验,如今大多数公司几乎没有明确的措施来评估其人工智能应用是否成功。原因在于,公司通常一开始会进行一些人工智能代理实验,但这些实验都是各自独立进行的。他们这样做是为了了解这项技术,但在这个过程中,他们几乎没有付出任何代价。或者根本没有关注如何成功应用人工智能(参见缺乏整体性方法)。因此,至关重要的是从一开始就建立清晰的衡量指标,并持续监控这些指标,以指导转型之旅。通过始终聚焦于一个案例,我们可以提供一个跟踪进展的实用框架。

In fact, it is our experience that today, most companies hardly have clear measures in place to assess whether their AI adoption efforts have been successful. The reason for this is that companies usually start out by implementing a few AI agent experiments that are run in siloed ways. They do so to get a feel of the technology, but in this process, they pay little or no attention to thinking about what makes an AI adoption successful (see the lack of a holistic approach). It’s thus crucial to establish clear metrics at the outset and continuously monitor them to guide the transformation journey. By focusing on a single example throughout, we can provide a practical framework for tracking progress.

假设一家金融服务公司正在部署人工智能代理来简化运营并改善客户支持。在部署之前,该公司针对四个关键领域建立了基准指标:运营效率、员工影响、客户体验和代理学习。这些不同的领域有效地构成了一个“平衡计分卡”,用于衡量面向代理的项目的成功和价值。

Consider a financial services firm implementing AI agents to streamline operations and improve customer support. Before deployment, the company established baseline metrics across four critical areas: operational efficiency, employee impact, customer experience, and agent learning. These diverse domains effectively comprise a “balanced scorecard” for the success and value of agent-oriented projects.

运营指标衡量了处理时间、错误率和单笔交易成本。项目启动之初,流程缓慢且容易出错。实施人工智能代理后,公司流程效率提升了 60%,错误率降低了 85%。这些成果与公司降低运营成本、提高准确性的目标完全契合。

Operational metrics measured processing times, error rates, and costs per transaction. At the project’s start, processes were slow and error-prone. After implementing AI agents, the firm saw a 60% improvement in process efficiency and an 85% reduction in errors. These gains directly aligned with the company’s goal of reducing operational costs while increasing accuracy.

员工影响指标评估了人工智能代理如何增强员工的工作能力。该公司追踪了员工在日常任务上节省的时间,并就工作满意度和技能发展情况对员工进行了调查。员工表示,由于重复性任务的自动化,他们能够专注于更有意义的工作,从而将更多时间投入到战略性活动中,满意度也显著提升,投入时间增加了 60%。

Employee impact metrics assessed how AI agents augmented staff roles. The firm tracked time saved on routine tasks and surveyed employees about job satisfaction and skills development. Employees reported spending 60% more time on strategic activities and a significant boost in satisfaction as repetitive tasks were automated, enabling them to focus on more meaningful work.

客户体验指标是另一项重点关注内容,包括满意度评分、问题解决时间和可用服务时间。通过部署人工智能客服人员提供一线客户支持,该公司实现了客户满意度提升30%。更快的问题解决速度和全天候服务在提升客户体验方面发挥了重要作用。

Customer experience metrics were another key focus, tracking satisfaction scores, resolution times, and service availability. By deploying AI agents for first-line customer support, the firm achieved a 30% increase in customer satisfaction. Faster resolution times and 24/7 availability played a significant role in enhancing the customer experience.

最后,学习和适应性指标监测了智能体随时间的改进能力。最初,人工智能智能体可以自主处理 40% 的客户案例。六个月后,随着智能体适应处理更复杂的场景,这一比例上升至 75%,减少了对人工干预的需求。

Finally, learning and adaptation metrics monitored the agents’ ability to improve over time. Initially, AI agents could handle 40% of customer cases autonomously. Six months later, this figure rose to 75% as the agents adapted to handle more complex scenarios, reducing the need for human intervention.

该公司不仅利用这些指标来衡量成功,还利用它们来维持反馈机制。每月一次的评估使他们能够追踪进展、发现需要改进的地方并调整策略。例如,当监控显示客服人员在处理高优先级案件时表现停滞不前时,公司会提供额外的培训数据并对算法进行微调。这种迭代方法确保了持续改进。

The firm used these metrics not only to measure success but also to maintain a feedback loop. Monthly reviews allowed them to track progress, identify areas for improvement, and adjust their strategy. For example, when monitoring showed that agent performance plateaued in handling high-priority cases, the company provided additional training data and fine-tuned the algorithms. This iterative approach ensured continuous improvement.

通过将所有指标与最初的商业案例联系起来,该公司确保了与战略目标的一致性。清晰的关键绩效指标 (KPI)、目标与关键成果 (OKR)、增长指标以及持续的监控,使利益相关者对整体转型价值充满信心,并在整个转型过程中保持了持续的动力。这个案例说明了结构化的、数据驱动的方法如何能够使人工智能代理的转型在整体层面上可衡量、可操作且可评估——涵盖了人类成长和人工智能集成——最终取得成功。

By tying all metrics back to the original business case, the firm maintained alignment with its strategic goals. Clear KPIs, OKRs, growth indicators, and consistent monitoring gave stakeholders confidence in the value of holistic transformation and sustained momentum throughout the journey. This example illustrates how a structured, data-driven approach can make AI agent transformations measurable, actionable, and evaluative in terms of progress at a holistic level—encompassing both human growth and AI integration—ultimately leading to success.

第十一章

CHAPTER 11

人工智能代理的规模化:从愿景到现实

SCALING AI AGENTS: FROM VISION TO REALITY

正确的扩展方法

The Right Scaling Approach

从规则到推理:在企业中扩展人工智能代理

From Rules to Reasoning: Scaling AI Agents in the Enterprise

T当我们展示试点实施的最终数据时,会议室里一片寂静。三级人工智能代理在短短两周内成功处理了超过1万个客户服务请求,准确率甚至超过了我们人工客服的基准水平。高管团队明显被震撼到了,纷纷向前倾身。随后,我们预料到的问题出现了:“效果显著。我们该如何将这项技术推广到整个组织?”

The conference room fell silent as we displayed the final numbers from our pilot implementation. The Level 3 AI agent had successfully processed over 10,000 customer service requests in just two weeks, achieving an accuracy rate that exceeded our human baseline. The executive team leaned forward, clearly impressed. Then came the question we’d been anticipating: “Great results. How do we scale this across the entire organization?”

去年我们曾与一家财富500强保险公司合作,该公司就曾发生过这样一幕:人工智能代理在企业范围内大规模部署的巨大潜力及其固有的挑战。尽管个别试点项目的成功案例日益增多,但真正能够将人工智能代理的应用扩展到独立用例之外的组织却寥寥无几。

This scene, which unfolded at a Fortune 500 insurance company we worked with last year, highlights both the immense potential and the inherent challenges of scaling AI agents across an enterprise. While individual pilot successes are increasingly common, very few organizations have managed to extend their AI agent implementations beyond isolated use cases.

我们的研究揭示了一个惊人的数据:在试点三级人工智能代理的公司中,只有不到1%的公司能够成功大规模部署。这与7-10年前智能自动化(二级代理)兴起时的情况如出一辙。当时,各组织在扩展其项目规模时也面临着类似的障碍。

Our research reveals a striking statistic: fewer than 1% of companies piloting Level 3 AI agents successfully deploy them at scale. This mirrors the situation we encountered 7–10 years ago with the emergence of intelligent automation (Level 2 agents). At that time, organizations faced similar hurdles in scaling their initiatives.

鉴于目前还没有任何公司建立起扩展 3 级人工智能代理的成熟框架,我们将借鉴过去的经验和当前(尽管有限)的观察结果,来概述最佳方法。

Given that no company has yet established a proven framework for scaling Level 3 AI agents, we will draw from past experiences and current, albeit limited, observations to outline the optimal approach.

基础:了解从哪里开始

The Foundation: Understanding Where to Start

当我们最初与这家全球保险公司合作时,其领导团队急于在整个理赔流程中部署人工智能代理。他们的热情可以理解——试点结果令人信服。然而,我们之前已经见过类似的情况。

When we first began working with this global insurance company, the leadership team was eager to jump straight into deploying AI agents across their entire claims processing operation. Their enthusiasm was understandable—the pilot results were compelling. However, we’d seen this movie before.

“在讨论规模化之前,”我们告诉他们,“先来谈谈三年前你们RPA实施的情况。”首席信息官不安地在椅子上挪了挪身子。他们之前尝试扩大自动化规模,在初期取得成功后就停滞不前,导致一位高管所说的“一堆报废机器人”。

“Before we talk about scaling,” we told them, “let’s discuss what happened with your RPA implementation three years ago.” The CIO shifted uncomfortably in his chair. Their previous attempt to scale automation had stalled after initial successes, leading to what one executive called “a graveyard of broken bots.”

寻找合适的机遇

Finding the Right Opportunities

我们没有自上而下地妄下结论,而是采取了基层调研的方式,花了六周时间深入到公司各个部门。“我们需要了解员工实际把时间花在哪里,而不是我们认为他们把时间花在哪里,”我们向高管团队解释道。

Rather than making assumptions from the top down, we took a grassroots approach, spending six weeks embedded with various departments across the organization. “We need to understand where people are actually spending their time, not where we think they’re spending it,” we explained to the executive team.

我们对来自理赔、客户服务、核保和运营等部门的50多位团队领导和员工进行了结构化访谈。这些访谈并非简单的正式会议——我们与员工们坐在一起,观察他们的日常工作,并倾听他们的心声。令人沮丧的事情。一位理赔员的话尤其让我们印象深刻:“我每天要花大约四个小时从不同的系统中收集和整理信息,才能开始分析理赔。”

We conducted structured interviews with over 50 team leaders and employees across claims, customer service, underwriting, and operations. These weren’t just formal meetings—we sat with employees, observed their daily work, and listened to their frustrations. One claims adjuster’s comment particularly stuck with us: “I spend about four hours each day just gathering and organizing information from different systems before I can even start analyzing a claim.”

这种洞察力非常宝贵。通过这些对话和观察,我们为每个部门绘制了详细的工作量图,明确了时间都花在了哪里,更重要的是,哪些环节存在时间浪费。我们发现,各部门员工将 60% 到 70% 的时间都花在了他们所谓的“行政杂务”上,而不是从事增值工作。

This kind of insight proved invaluable. Through these conversations and observations, we created detailed workload maps for each department, identifying where time was being spent and, more importantly, where it was being wasted. We discovered that employees across departments were spending 60-70% of their time on what they described as “administrative overhead” rather than value-added work.

优先考虑影响

Prioritizing for Impact

完成工作负载映射后,我们应用了所谓的“20/80 原则”——找出占用人们 80% 时间的 20% 的活动。这些高工作负载活动成为我们人工智能代理实现的主要关注点。

With our workload mapping complete, we applied what we call the “20/80 principle”—identifying the 20% of activities consuming 80% of people’s time. These high-workload activities became our primary focus for AI agent implementation.

例如,在理赔部门,我们发现有三项活动耗时过长:

In the claims department, for example, we found three activities that consumed disproportionate amounts of time:

从多个系统收集和整合信息

Information gathering and consolidation from multiple systems

初步理赔分诊和路由

Initial claims triage and routing

与客户和供应商的标准通信

Standard correspondence with customers and providers

这些工作不仅耗时,员工们还形容它们重复乏味、容易出错。“这工作简直让人麻木,”一位团队成员告诉我们,“我学的是保险理赔​​,但现在我一天的大部分时间都在复制粘贴信息。”

These activities weren’t just time-consuming—they were also described by employees as repetitive, tedious, and prone to errors. “It’s mind-numbing work,” one team member told us. “I went to school for insurance adjusting, but I spend most of my day copying and pasting information.”

可行性评估

Feasibility Assessment

并非所有耗时的流程都适合人工智能代理自动化。我们根据以下三个关键标准评估了每项机会:

Not every time-consuming process is suitable for AI agent automation. We evaluated each opportunity against three key criteria:

技术可行性:目前的人工智能代理技术能否有效处理这项任务?

Technical feasibility: Could current AI agent technology handle the task effectively?

工艺稳定性:该工艺是否足够稳定以进行自动化,还是需要进行重大重新设计?

Process stability: Was the process stable enough to automate, or did it require significant redesign?

数据可用性:我们是否能够以可用的格式获取必要的数据?

Data availability: Did we have access to the necessary data in a usable format?

这项评估使我们意识到一个重要事实:我们许多工作量繁重的流程都通过共享数据和系统相互关联。理赔员在信息收集方面遇到的困难,与承保人用于保单续保的许多数据源相同。

This assessment led us to an important realization: many of our high-workload processes were interconnected through shared data and systems. The information gathering that claims adjusters struggled with used many of the same data sources that underwriters accessed for policy renewals.

构建规模化商业案例

Building the Business Case for Scale

我们在为期六周的评估阶段收集到的洞见不仅帮助我们发现了机遇,还为我们构建了一个令人信服的商业案例奠定了基础。“数据会说话,”我们向管理团队解释道,“而我们的数据准确地告诉我们,人工智能代理可以在哪些方面以及如何创造最大价值。”

The insights we gathered during our six-week assessment phase didn’t just help us identify opportunities—they provided the foundation for a compelling business case. “Data tells the story,” we explained to the executive team. “And our data tells us exactly where and how AI agents can create the most value.”

我们从工作量映射和可行性评估中收集的数据构成了我们商业案例的基础。我们围绕定量和定性两方面的收益构建了该案例。在定量方面,我们计算得出,仅在理赔部门,自动化已识别的高工作量流程每年就能节省约 45,000 人时,相当于直接节省 320 万美元的成本。定性收益同样令人信服:降低错误率、加快理赔处理速度、提高客户满意度。通过消除重复性任务来提高员工敬业度。

The data gathered from our workload mapping and feasibility assessments formed the foundation of our business case. We structured it around both quantitative and qualitative benefits. On the quantitative side, we calculated that automating the identified high-workload processes would save approximately 45,000 person-hours annually across the claims department alone, translating to $3.2 million in direct cost savings. The qualitative benefits were equally compelling: reduced error rates, faster claims processing, improved customer satisfaction, and enhanced employee engagement through the elimination of repetitive tasks.

我们还考虑了实施成本,包括技术许可、开发资源和变更管理工作,最终得出三年内预期投资回报率为 285%。这份全面的商业案例对于获得高管支持和全面实施所需的资金至关重要。更重要的是,它建立了清晰的成功指标,我们可以在整个实施过程中进行跟踪,这有助于我们保持势头,并在推广的每个阶段向利益相关者展示价值。

We also factored in implementation costs, including technology licenses, development resources, and change management efforts, arriving at an expected ROI of 285% over three years. This comprehensive business case proved crucial in securing executive buy-in and necessary funding for the full-scale implementation. More importantly, it established clear metrics for success that we could track throughout the implementation, helping us maintain momentum and demonstrate value to stakeholders at each phase of the rollout.

实现规模化的系统路径

The Systematic Path to Scale

在确定了优先机会之后,我们制定了一套结构化的三阶段方法,用于在公司内部署每个人工智能代理。我们没有试图一次性完成所有工作,而是有条不紊地推进流程重组、部署迭代和生产迁移。让我们来看看这种方法在我们第一个主要部署项目——理赔信息收集和整合代理——中的应用效果如何。

With our priority opportunities identified, we established a structured three-phase approach for implementing each AI agent in the company. Rather than tackle everything at once, we moved methodically through process redesign, deployment sprints, and production migration. Let’s see how this worked with our first major implementation—the claims information gathering and consolidation agent.

第一阶段:流程重新设计和优化

Phase 1: Process Redesign and Optimization

“既然要实现自动化,那就确保自动化的是正确的流程,”这成了我们的座右铭。我们与由理赔员、IT专家和流程优化专家组成的跨职能团队合作,花了三周时间详细梳理了当前的流程。调查结果令人大开眼界——理赔员需要访问七个不同的系统来收集信息,而且常常是重复操作。

“If we’re going to automate this, let’s make sure we’re automating the right process,” became our mantra. Working with a cross-functional team of claims adjusters, IT specialists, and process excellence experts, we spent three weeks mapping the current process in detail. The findings were eye-opening—adjusters were accessing seven different systems to gather information, often in a redundant manner.

我们从零开始重新构想了整个流程,并自问:“如果今天用人工智能代理来构建这个系统,它会如何运作?”重新设计的流程整合了接入点,标准化了数据格式,并在人工智能代理和人工理赔员之间建立了清晰的交接机制。我们减少了系统交互点。通过实施集中式数据湖,人工智能代理可以直接查询,从而将七点减少到三点。

We reimagined the process from scratch, asking ourselves: “If we were building this today with AI agents, how would it work?” The redesigned process consolidated access points, standardized data formats, and created clear handoffs between AI agents and human adjusters. We reduced the system touchpoints from seven to three by implementing a centralized data lake that the AI agent could query directly.

第二阶段:部署冲刺

Phase 2: Deployment Sprints

有了重新设计的流程,我们便进入了开发阶段,将工作组织成为期两周的迭代周期。每个迭代周期都专注于交付可测试和完善的特定功能。对于理赔信息代理,我们安排了六个迭代周期,每个迭代周期都在前一个迭代周期的功能基础上进行扩展。

With our redesigned process in hand, we moved into the development phase, organizing our work into two-week sprints. Each sprint focused on delivering specific functionality that could be tested and refined. For the claims information agent, we structured six sprints, each building upon the previous one’s functionality.

我们让理赔员参与到每一次迭代评审中,收集他们的反馈并进行调整。这种快速迭代的模式被证明非常宝贵。例如,在第三次迭代中,理赔员指出某些类型的理赔需要特殊处理——这是我们最初没有考虑到的。我们迅速调整了代理的逻辑以应对这些例外情况。

We involved claims adjusters in every sprint review, gathering their feedback and making adjustments. This rapid iteration cycle proved invaluable. During the third sprint, for instance, adjusters pointed out that certain claim types required special handling—something we hadn’t initially considered. We quickly adapted the agent’s logic to account for these exceptions.

第三阶段:测试和生产迁移

Phase 3: Testing and Production Migration

我们没有采用“大爆炸”式的方法,而是实施了分阶段部署策略:

Rather than a “big bang” approach, we implemented a graduated deployment strategy:

第一周:10% 的索赔申请通过人工智能代理进行处理。

Week 1: 10% of incoming claims routed through the AI agent

第二周:25%的索赔,并扩大了监测范围

Week 2: 25% of claims, with expanded monitoring

第三周:索赔处理量为50%,人工审核减少。

Week 3: 50% of claims, with reduced manual oversight

第 4 周:75% 的索赔,维持审计流程

Week 4: 75% of claims, maintaining audit protocols

第 5 周:全面部署并进行标准监控

Week 5: Full deployment with standard monitoring

这种循序渐进的方法让我们在管控风险的同时建立了信心。“起初,我总是反复核查代理人做的每一件事,”一位资深理赔员告诉我们,“但几周后,我更信任他们的工作,而不是我自己收集的数据。”

This gradual approach allowed us to build confidence while managing risk. “At first, I kept double-checking everything the agent did,” one senior adjuster told us. “But after a few weeks, I trusted it more than I trusted my own data gathering.”

结果与经验教训

The Results and Lessons Learned

在六个月内,我们成功地将人工智能代理扩展到多个流程中,并取得了显著成果:

Within six months, we had successfully scaled AI agents across multiple processes, achieving significant results:

信息收集时间缩短了70%。

Reduced information-gathering time by 70%

85% 的标准客户信函已实现自动化

Automated 85% of standard customer correspondence

缩短理赔处理时间 45%

Cut claims processing time by 45%

更重要的是,员工满意度显著提升。“多年来,我第一次把大部分时间花在真正分析理赔案件上,而不是仅仅收集信息,”一位资深理赔员表示。

More importantly, employee satisfaction improved significantly. “For the first time in years, I’m spending most of my day actually analyzing claims instead of just gathering information,” reported one senior adjuster.

我们这段历程中的关键经验是什么?人工智能代理的成功关键不在于部署最先进的技术,而在于找到自动化能够创造最大价值的合适时机。通过系统化的实施和扩展方法,企业可以创造可持续的价值,同时避免那些导致许多自动化项目失败的常见陷阱。

The key lessons from our journey? Success with AI agents isn’t about implementing the most advanced technology—it’s about finding the right opportunities where automation can create the most value. By taking a systematic approach to implementation and scaling, organizations can create sustainable value while avoiding the common pitfalls that derail many automation initiatives.

展望未来,我们将不断改进我们的方法。技术格局瞬息万变,新技术层出不穷。然而,基本原则始终不变:抓住正确的机遇,深思熟虑地重新设计流程,系统地实施,并始终牢记人的因素。

As we look to the future, we continue to evolve our approach. The technology landscape is rapidly changing, with new capabilities emerging regularly. However, the fundamental principles remain the same: start with the right opportunities, redesign processes thoughtfully, implement systematically, and always keep the human element in mind.

正如一家全球保险公司的团队负责人所说:“这些人工智能代理并没有取代我们;它们最终让我们能够做我们被聘用时应该做的工作。”这或许才是衡量人工智能代理规模化应用是否成功的最终标准。

As one team leader at the global insurance company put it, “These AI agents aren’t replacing us; they’re finally letting us do the job we were hired to do.” That, perhaps, is the ultimate measure of success in scaling AI agents.

自动化体验优势:从二级代理到三级代理

The Automation Experience Advantage: from Level 2 to Level 3 agents

在人工智能领域,从基础自动化到复杂人工智能代理的路径并非总是笔直的。一些企业发现,他们之前在自动化技术方面的投资,正意外地成为迈向更高级人工智能应用的垫脚石。为了理解这一演变过程,让我们来探究一下全球领先的建筑技术和解决方案提供商江森自控国际公司(JCI)在转型过程中的经验。

In the landscape of artificial intelligence, the journey from basic automation to sophisticated AI agents isn’t always a straight line. Some organizations are discovering that their previous investments in automation technologies are becoming unexpected stepping stones to more advanced AI implementations. To understand this evolution, let’s explore the experience of Johnson Controls International (JCI), a global leader in building technologies and solutions, as they navigate this transformation.

基础:从自动化到智能

The Foundation: From Automation to Intelligence

我们第一次与江森自控 (JCI) 全球卓越运营和智能自动化实施负责人拉姆纳特·纳塔拉詹 (Ramnath Natarajan) 会面时,就被他们现有的自动化基础设施的规模所震撼。“我们目前运营着 250 个数字员工和 2000 个 API,”他解释道,并重点介绍了他们先进的二级(智能自动化)自动化环境,该环境融合了机器人流程自动化 (RPA)、业务流程管理 (BPM) 和各种人工智能工具。

When we first met with Ramnath Natarajan, who leads global implementation of operational excellence and intelligent automation at JCI, we were struck by the scope of their existing automation infrastructure. “We currently operate with 250 digital workers and 2,000 APIs,” he explained, highlighting a sophisticated Level 2 (Intelligent Automation) automation environment that combines robotic process automation (RPA), business process management (BPM), and various AI tools.

这一基础架构看似复杂的技术细节,实则代表着更为根本的意义:它体现了整个组织对流程改进和数字化转型的坚定承诺。对于像江森自控这样的公司而言,现有的基础设施不仅仅意味着成本节约——尽管它们确实从中获得了显著的财务收益。更重要的是,它们已经构建了实施和扩展自动化解决方案所需的组织能力。

This foundation might seem like a complex technical detail, but it represents something more fundamental: an organization-wide commitment to process improvement and digital transformation. For companies like JCI, this existing infrastructure isn’t just about cost savings—though they’ve achieved significant financial benefits. It’s about having built the organizational muscles needed to implement and scale automated solutions.

变革的催化剂

The Catalyst for Change

JCI 的故事之所以特别有趣,是因为他们意识到,虽然他们的 2 级智能自动化系统正在创造价值,但它们也遇到了固有的局限性。传统的自动化方法擅长处理结构化、可预测的任务,但难以应对现代企业日益需要的自适应、情境感知操作。

What makes JCI’s story particularly interesting is their recognition that while their Level 2 intelligent automation systems were delivering value, they were hitting natural limitations. Traditional automation approaches excel at handling structured, predictable tasks but struggle with the kind of adaptive, context-aware operations that modern businesses increasingly require.

“我们正在超越特定任务的自动化,转向能够协调整个工作流程的智能体,”纳塔拉詹在我们的讨论中分享道。这种转变体现了二级智能体和三级智能体之间的核心区别。二级系统可以处理复杂但预先设定的场景,而三级智能体则可以理解上下文、处理自然语言,并协调多种工具来实现更广泛的目标。

“We’re moving beyond task-specific automation to agents capable of orchestrating entire workflows,” Natarajan shared during our discussion. This shift represents the core difference between Level 2 and Level 3 agents. While Level 2 systems can handle complex but predetermined scenarios, Level 3 agents can understand context, process natural language, and orchestrate multiple tools to achieve broader objectives.

层级之间的桥梁

The Bridge Between Levels

二级自动化,以RPA和智能自动化等工具为代表,擅长处理在明确定义的参数范围内重复性的、基于规则的任务。二级系统依赖于预定义的规则和结构化数据,就像高效的流水线工人一样。它们在特定任务上表现出色,但如果没有人工干预,就无法适应重大变化。

Level 2 automation, characterized by tools like RPA and intelligent automation, excels at handling repetitive, rule-based tasks within clearly defined parameters. Level 2 systems, with their reliance on predefined rules and structured data, are like highly efficient assembly line workers. They’re excellent at their specific tasks but can’t adapt to significant changes without human intervention.

例如,江森自控(JCI)利用RPA技术简化计费流程,显著减少了人为错误并提高了效率。然而,即便取得了这些成果,人工​​监督仍然是至关重要的环节。这些系统难以适应动态的跨职能工作流程,也难以处理需要精细决策的例外情况。这种局限性在客户服务和现场运营等领域尤为明显,流程碎片化和对人工干预的依赖常常导致延误和效率低下。

For example, JCI used RPA to streamline billing processes, significantly reducing manual errors and improving efficiency. However, even with these gains, human oversight remained a critical component. These systems struggled to adapt to dynamic, cross-functional workflows or manage exceptions that required nuanced decision-making. This limitation became evident in areas such as customer service and field operations, where delays and inefficiencies often resulted from fragmented processes and reliance on human intervention.

JCI 向 3 级人工智能代理的演进旨在弥补这些差距。纳塔拉詹强调了战略的转变:“我们正在超越任务自动化,转向能够协调整个工作流程的代理。重点在于消除对人的依赖,实现跨职能部门的无缝流程执行。”这一愿景需要重新构想自动化。一个相互关联、适应性强的系统,而不是一系列孤立的任务。

JCI’s progression to Level 3 AI agents was designed to address these gaps. Natarajan emphasized the shift in strategy: “We’re moving beyond task automation to agents that can orchestrate entire workflows. The focus is on eliminating human dependencies and achieving seamless process execution across functions.” This vision required reimagining automation as an interconnected, adaptive system rather than a collection of isolated tasks

江森自控 (JCI) 迈向 L3 级自动化的历程,展现了人工智能代理如何将现有的智能自动化系统转变为强大且适应性强的生态系统。这一转变并非取代智能自动化,而是对其进行增强和连接。通过从特定应用入手——例如管理异常情况或处理细致入微的决策——人工智能代理逐步证明其价值,建立信任,并为更广泛的部署铺平道路。

JCI’s journey to Level 3 automation illustrates how AI agents transform existing intelligent automation systems into powerful, adaptive ecosystems. The transition is not about replacing intelligent automation but enhancing and connecting it. By starting with targeted applications—like managing exceptions or handling nuanced decisions—AI agents prove their value incrementally, building trust and paving the way for broader deployment.

这种方法不仅能帮助组织提升运营效率,还能增强战略敏捷性。人工智能代理可以消除瓶颈,减少对人工干预的依赖,并使团队能够专注于高价值活动。正如纳塔拉詹精辟地指出:“这不仅仅是自动化;而是要创建一个能够自我适应并蓬勃发展的系统。” 江森自控的成功表明,迈向第三级自动化既可行又具有变革意义。

This approach enables organizations to achieve not only operational efficiency but also strategic agility. AI agents eliminate bottlenecks, reduce reliance on human intervention, and empower teams to focus on high-value activities. As Natarajan succinctly put it: “It’s not just about automation; it’s about creating a system that adapts and thrives on its own”. JCI’s success demonstrates that the leap to Level 3 is both achievable and transformative.

经验的优势

The Advantage of Experience

JCI 的案例之所以特别具有启发意义,在于他们如何利用二级自动化经验为三级智能体的部署创造优势。这主要归功于以下几个关键因素:

What makes JCI’s story particularly instructive is how their experience with Level 2 automation created advantages in implementing Level 3 agents. Several key factors contributed to this:

1. 基础设施就绪:他们现有的 API 和数字员工网络为更复杂的代理程序提供了现成的基础。这意味着他们并非从零开始,而是扩展和增强现有功能。

1. Infrastructure Readiness: Their existing network of APIs and digital workers provided a ready-made foundation for more sophisticated agents to build upon. This meant they weren’t starting from scratch but rather extending and enhancing existing capabilities.

2. 流程理解:多年的自动化实施让他们对业务流程有了深刻的了解,并意识到哪些环节最需要人工干预。正如纳塔拉詹所指出的,他们的目标变成了“用完全自主、流程驱动的代理来取代人为干预的依赖关系。”

2. Process Understanding: Years of implementing automation had given them deep insights into their business processes and where human intervention was most crucial. As Natarajan noted, their goal became to “replace human-in-the-loop dependencies with fully autonomous, process-driven agents.”

3. 组织架构协调:或许最重要的是,他们已经建立了实施和扩展技术解决方案所需的组织架构。他们拥有超过100名员工的团队,致力于卓越运营和自动化,为这一变革提供了必要的人力资源。

3. Organizational Alignment: Perhaps most importantly, they had already established the organizational structures needed to implement and scale technological solutions. Their team of over 100 employees dedicated to operational excellence and automation provided the human expertise needed to guide this evolution.

从挑战中学习

Learning from Challenges

转型并非一帆风顺,这些困难也为其他考虑类似转型的组织提供了宝贵的经验教训。JCI 就遇到了几个重大障碍:

The transition hasn’t been without its challenges, and these difficulties offer valuable lessons for other organizations considering similar journeys. JCI encountered several significant hurdles:

集成复杂性:即使拥有丰富的经验,将代理与遗留系统连接起来仍然充满挑战。“考虑到组织内部系统的异构性,按地区划分数据非常复杂,”Natarajan解释道。这凸显了即使是拥有成熟自动化能力的组织,在迁移到更复杂的代理时,也需要仔细考虑其技术架构。

Integration Complexity: Even with their extensive experience, connecting agents with legacy systems proved challenging. “Grounding data by region is complex, given the organization’s heterogeneous systems,” Natarajan explained. This highlights how even organizations with mature automation capabilities need to carefully consider their technical architecture when moving to more sophisticated agents.

供应商局限性:江森自控发现,他们现有的自动化供应商未必完全准备好实施更高级的代理程序。他们发现,一些供应商缺乏深度集成能力,导致代理程序开发速度缓慢;而另一些供应商对代理程序的定义过于宽泛,则造成了兼容性方面的挑战。这一经验凸显了在进行三级自动化实施时,无论是否存在现有合作关系,都必须认真评估技术合作伙伴的重要性。

Vendor Limitations: JCI found that their existing automation vendors weren’t necessarily fully ready for more advanced agent implementations. They discovered that some vendors’ lack of deep integration capabilities slowed agent development, while some other vendors’ broad agent definitions created alignment challenges. This experience underscores the importance of carefully evaluating technology partners for Level 3 implementations, regardless of existing relationships.

前进之路

The Path Forward

JCI的经验表明,虽然以往的自动化经验可以为实施3级代理提供优势,这并非必要条件。关键在​​于理解三级代理代表了一种截然不同的自动化方法——这种方法需要仔细考虑技术和组织因素。

JCI’s experience suggests that while previous automation experience can provide advantages in implementing Level 3 agents, it’s not a prerequisite. The key is understanding that Level 3 agents represent a fundamentally different approach to automation—one that requires careful consideration of both technical and organizational factors.

他们对未来的愿景包括到2026年实现多智能体系统的运行,这凸显了这一发展是长期数字化转型之旅的一部分。“智能体将执行端到端流程,实现无缝集成,并增强业务自主性,”纳塔拉詹分享道,并描绘了人工智能智能体成为业务运营不可或缺组成部分的未来图景。

Their vision for the future includes having multi-agent systems operational by 2026, highlighting how this evolution is part of a longer-term digital transformation journey. “Agents will execute end-to-end processes, integrate seamlessly, and enhance business autonomy,” Natarajan shared, outlining a future where AI agents become integral to business operations.

组织需要考虑的关键因素

Key Considerations for Organizations

对于正在考虑向3级代理商迈进的组织而言,江森自控的经验提供了几点启示:

For organizations considering their own journey toward Level 3 agents, several insights emerge from JCI’s experience:

从明确的应用案例入手:江森自控专注于客户服务、现场运营、企业财务和采购,为代理商实施方案设定了清晰且以价值为导向的目标。这种聚焦式方法有助于应对复杂性并展现价值。

Start with Clear Use Cases: JCI’s focus on customer service, field operations, corporate finance, and procurement provided clear, value-driven targets for agent implementation. This focused approach helps manage complexity and demonstrate value.

发挥优势:无论这些优势是来自以往的自动化经验还是其他组织能力,都要识别并利用现有的优势,而不是从零开始。

Build on Strengths: Whether those strengths come from previous automation experience or other organizational capabilities, identify and leverage existing advantages rather than starting completely fresh.

管理预期:从二级到三级的跃升不仅仅是技术上的升级,它代表着工作方式的根本性转变。组织需要为由此带来的机遇和挑战做好准备。

Manage Expectations: The jump from Level 2 to Level 3 isn’t just a technical upgrade—it represents a fundamental shift in how work gets done. Organizations need to prepare for both the opportunities and challenges this presents.

更广阔的视角

The Broader Perspective

尽管江森自控的案例表明,以往的自动化经验如何有助于部署三级代理,但值得注意的是,这并非唯一的途径。即使缺乏丰富的自动化经验,企业仍然可以成功部署三级代理。通过聚焦明确的用例并构建必要的组织能力,成功实施三级代理。

While JCI’s story demonstrates how previous automation experience can facilitate the implementation of Level 3 agents, it’s important to note that this isn’t the only path forward. Organizations without extensive automation experience can still successfully implement Level 3 agents by focusing on clear use cases and building the necessary organizational capabilities.

关键在于理解三级智能体代表着自动化服务于业务需求方式的重大进步。这些智能体能够理解上下文、处理自然语言并协调复杂的工作流程,从而带来超越传统自动化方法的变革机遇。

The key is understanding that Level 3 agents represent a significant advancement in how automation can serve business needs. These agents’ ability to understand context, process natural language, and orchestrate complex workflows offers opportunities for transformation that go beyond traditional automation approaches.

展望未来,人工智能代理的演进将持续重塑组织的运作方式。无论是在现有自动化能力的基础上进行扩展,还是从零开始构建更先进的实施方案,成功的关键都在于理解有效利用这些强大工具所需的技术能力和组织变革。

As we look to the future, the evolution of AI agents will continue to reshape how organizations operate. Whether building on existing automation capabilities or starting fresh with more advanced implementations, the key to success lies in understanding both the technical capabilities and organizational changes required to effectively leverage these powerful tools.

利用生成式人工智能和人工智能代理实现全面的人工智能企业转型

Leveraging Generative AI and AI Agents for a Holistic AI Corporate Transformation

当我们与一家全球制造公司的管理团队在新加坡的一间会议室里开会时,一场有趣的辩论展开了。首席技术官坚持要构建人工智能代理来实现运营自动化,而创新主管则热情地倡导用生成式人工智能工具赋能员工。他们当时可能并未意识到,这种看似矛盾的观点,竟会引领我们对未来工作模式做出最令人着迷的发现之一。

As we sat in a conference room in Singapore with the executive team of a global manufacturing company, an interesting debate unfolded. The CTO was adamant about building AI agents to automate their operations, while the Head of Innovation passionately advocated for empowering employees with generative AI tools. Little did they know that this apparent tension would lead us to one of our most fascinating discoveries about the future of work.

接下来的六个月,我们见证了人工智能如何变革组织机构,这为我们提供了一个完美的实验平台。这种变革并非通过单一途径,而是通过两种互补的力量,精心结合后,其威力远超任何一种力量单独作用所能达到的程度。

What we witnessed over the next six months became a perfect laboratory for understanding how AI transforms organizations. Not through one pathway, but through two complementary forces that, when combined thoughtfully, create something far more powerful than either could achieve alone.

我们先来看一个我们与他们销售团队进行的实验。我们要求一半的团队成员继续照常工作,而另一半团队成员则可以使用生成式人工智能工具和……我们开发了一个基础人工智能代理。结果令人惊讶,但原因却出乎我们的意料。

Let’s start with an experiment we conducted with their sales team. We asked half the team to continue working as usual, while the other half got access to both generative AI tools and a basic AI agent we had developed. The results were striking, but not for the reasons we expected.

黛博拉是他们公司业绩最好的员工之一,她发现自己开始依赖人工智能代理来默默地处理会议安排、后续跟进和项目进度更新等工作,从而腾出更多时间。但真正让我们注意到的是她如何利用这些新腾出的时间。她运用生成式人工智能来撰写更具说服力的提案,并与客户进行个性化沟通。“我终于可以做我一直想做的事情了,”她告诉我们,“我不再需要花费数小时来安排会议的琐碎事务,也不再需要费力地从战略角度思考客户交易,而是可以与客户进行更多深入的对话,探讨他们面临的长期挑战。”

Deborah, one of their top performers, found herself relying on an AI agent to quietly handle her meeting scheduling, follow-ups, and pipeline updates in the background to free-up her time. But what really caught our attention was how she used her newly freed time. She leveraged generative AI to craft more persuasive proposals and personalized client communications. “I’m finally able to do what I always wanted,” she told us. “Instead of spending hours on the logistics of organizing meetings and struggling to strategically think about my client deals, I’m having more and more deep conversations with clients about their long-term challenges.”

这次经历生动地展现了我们对人工智能在组织中带来的双重变革的理解。不妨把它想象成同时学习驾驶自动挡汽车和使用GPS导航。自动挡(如同人工智能代理)负责处理机械方面的复杂性,而GPS(如同生成式人工智能)则增强了你的导航能力,帮助你做出更明智的决策。二者结合,不仅能让你成为更高效的驾驶员,更能彻底改变你的出行体验。

This experience illustrated what we’ve come to understand as the dual transformation that AI enables in organizations. Think of it as learning to drive with both an automatic transmission and a GPS. The automatic transmission (like AI agents) handles the mechanical complexity, while the GPS (like generative AI) enhances your ability to navigate and make better decisions. Together, they don’t just make you a more efficient driver—they transform the entire experience of traveling.

生成式人工智能与人工智能代理的融合为企业带来了变革性的机遇。我们发现,这种双管齐下的方法能够创造出远超各部分之和的成果。这些技术不仅重新定义了企业的运营方式,更赋能员工,让他们充分发挥自身最宝贵的人性特质——创造力、同理心、批判性思维和个人成长。这种双管齐下的方法能够培养出一支能够取得卓越成果的员工队伍,同时也将重新定义未来的工作模式。

The fusion of generative AI and AI agents presents a transformative opportunity for organizations. We’ve discovered that this dual approach creates something greater than the sum of its parts. Together, these technologies not only redefine the way companies operate but also empower employees to embrace their most human qualities—creativity, empathy, critical thinking, and personal growth. This dual approach fosters a workforce capable of achieving exceptional outcomes while redefining the future of work.

人工智能在企业转型中的双重角色

The Dual Role of AI in Corporate Transformation

人工智能的变革潜力大致可以分为两个相互依存的途径。

AI’s transformative potential can be broadly categorized into two interdependent pathways.

第一种是生成式人工智能,这项技术通过赋能员工以前所未有的速度和深度与他人沟通、进行批判性思考和创造,从而增强人类能力。生成式人工智能工具将撰写文稿、创作内容或分析复杂数据等任务转化为赋能时刻。借助这些工具,员工将成为“超人”,能够放​​大自身优势,应对以往难以企及的挑战。

The first is Generative AI, a technology that enhances human capabilities by enabling employees to connect with others, think critically, and create with unprecedented speed and depth. Generative AI tools transform tasks such as drafting communications, creating content, or analyzing complex data into moments of empowerment. With these tools, employees become “superhumans,” equipped to amplify their natural strengths and tackle challenges previously beyond their reach.

第二条途径是通过人工智能代理,这些系统旨在处理重复性、复杂、敏感甚至危险的任务。这些代理能够自动化流程,节省时间、减少错误,同时还能承担繁琐的工作。通过承担这些角色,人工智能代理使员工能够专注于更有意义的活动,无论这些活动是在其职业角色中做出战略性贡献,还是追求个人目标和丰富人生体验。这些技术共同帮助企业实现两大目标:在提高效率的同时,充分释放员工的潜能。

The second pathway is through AI Agents, systems designed to handle repetitive, complex, sensitive, or even hazardous tasks. These agents automate processes, saving time and reducing errors while managing tedious responsibilities. By taking on these roles, AI agents free employees to focus on more meaningful activities, whether those activities involve strategic contributions in their professional roles or pursuing personal goals and enriching experiences in their lives. Together, these technologies allow companies to achieve two goals: scaling efficiency while unlocking the full human potential of their workforce.

利用生成式人工智能增强人类特质

Strengthening Human Qualities with Generative AI

根据我们的经验,生成式人工智能已成为一股普及化的力量,使员工能够更高效、更具创造性地完成任务。与以往的人工智能浪潮不同,生成式人工智能能够积极地补充人类智能。例如,我们看到,营销人员利用生成式人工智能可以更快地制作出引人入胜的营销活动,而管理者则可以更细致地分析绩效数据,从而做出更明智、更快速的决策。

According to our experience, generative AI has emerged as a democratizing force, enabling employees to perform tasks more effectively and creatively. Unlike previous waves of AI, generative AI actively complements human intelligence. We’ve seen, for example, how a marketing professional using generative AI can produce engaging campaigns faster, while a manager can analyze performance data with greater nuance to make smarter, faster decisions.

我们观察到的一个悖论是,尽管超过 90% 的公司都在谨慎地尝试生成式人工智能,但全球 75% 的知识工作者已经每天都在使用它,其中许多人甚至使用未经授权的工具。这种草根式的普及表明了生成式人工智能的直观吸引力——它易于使用、能够立即产生结果,并且能够无缝集成。融入工作流程。员工的行动速度超过了雇主,展现出一种由个人而非企业指令驱动的自下而上的变革。

A paradox we’ve observed is that while over 90% of companies cautiously experiment with generative AI, 75% of knowledge workers globally already use it daily, with many resorting to unsanctioned tools. This grassroots adoption demonstrates the intuitive appeal of generative AI—it’s simple to use, delivers immediate results, and integrates seamlessly into workflows. Employees are outpacing their employers, showcasing a bottom-up transformation driven by individuals rather than corporate mandates.

这种转变凸显了企业大规模应用生成式人工智能的必要性。通过提供结构化的培训、开放的案例分享论坛和协作平台,企业可以将这种自发的采用转化为可衡量的改进。例如,我们看到一些企业每周举办研讨会,让员工分享他们如何使用生成式人工智能,从而营造出一种集体学习和创新的文化。

This shift underscores the need for companies to embrace generative AI at scale. By providing structured training, open forums for sharing use cases, and platforms for collaboration, businesses can channel this organic adoption into measurable improvements. For instance, we’ve seen organizations host weekly sessions where employees share how they use generative AI, creating a culture of collective learning and innovation.

生成式人工智能不仅节省时间,还能创造时间从事更高价值的活动。我们曾与一些团队合作,其中产品设计师利用人工智能快速构建原型,从而腾出更多时间进行头脑风暴和创意探索。销售团队摆脱了手动撰写邮件的繁琐工作,可以将精力集中在深化客户关系上。生​​成式人工智能并非要取代人类,而是要提升人类的能力。

Generative AI doesn’t just save time—it creates time for higher-value activities. We’ve worked with teams where a product designer uses AI to rapidly prototype ideas, reserving more time for brainstorming and creative exploration. A sales team, unburdened by the manual drafting of emails, can instead focus their energy on deepening client relationships. Generative AI is not about replacing humans; it’s about elevating them.

利用人工智能代理实现复杂性自动化

Automating Complexity with AI Agents

根据我们的经验,人工智能代理在企业转型中扮演着默默奉献的角色,尤其擅长那些需要精准、快速和可靠的领域。我们看到它们能够处理诸如发票处理、物流协调或合规性检查等重复性任务,并且常常能够跨越组织壁垒,实现端到端流程的优化。

According to our experience, AI agents operate as silent partners in corporate transformation, excelling in areas that demand precision, speed, and reliability. We’ve seen them handle repetitive tasks such as invoice processing, logistics coordination, or compliance checks, often working across organizational silos to optimize processes end-to-end.

例如,在我们与一家金融服务公司的合作中,部署人工智能代理来管理账单操作带来了显著的改进。过去需要多个部门协作才能完成的任务——例如验证发票、跟踪供应商绩效和核对差异——现在都可以由单个人工智能代理无缝处理。这种自动化将处理时间缩短了 40%,并减少了 85% 的错误,使员工能够专注于战略和客户互动。

For example, in our work with a financial services company, deploying AI agents to manage billing operations led to dramatic improvements. Tasks that once required multiple departments—validating invoices, tracking vendor performance, and reconciling discrepancies—were seamlessly handled by a single AI agent. This automation reduced processing time by 40% and eliminated 85% of errors, freeing employees to focus on strategy and client engagement.

然而,人工智能代理的影响远不止于提高效率。通过自动化处理繁琐的工作,公司可以重新分配资源。将人力投入到那些能够充分发挥创造力、同理心和决策能力的领域。以前深陷重复性工作的员工现在可以专注于解决问题、指导同事或建立更牢固的客户关系。

However, the impact of AI agents extends far beyond efficiency. By automating the mundane, companies can reallocate human effort to areas where creativity, empathy, and decision-making shine. Employees previously bogged down in repetitive tasks can instead focus on problem-solving, mentoring colleagues, or building stronger customer relationships.

根据我们的观察,这种方法需要自上而下的战略性举措。与依赖个体实验的生成式人工智能不同,部署人工智能代理需要周密的计划、跨部门的协调以及健全的治理。组织必须明确目标、确保数据完整性,并设计能够无缝集成到现有工作流程中的代理,才能充分发挥其变革潜力。

From what we’ve observed, this approach requires a strategic, top-down initiative. Unlike generative AI, which thrives on individual experimentation, deploying AI agents demands careful planning, cross-departmental coordination, and robust governance. Organizations must define clear goals, ensure data integrity, and design agents that integrate seamlessly into existing workflows to achieve their full transformative potential.

统一愿景:生成式人工智能与人工智能代理协同工作

A Unified Vision: Generative AI and AI Agents Working Together

以我们最近与一家全球制造企业的合作为例。当我们最初与他们的领导团队探讨人工智能转型时,他们陷入了我们所谓的“非此即彼”的困境——他们认为必须在投资用于员工的生成式人工智能工具和开发用于流程自动化的人工智能代理之间做出选择。我们共同发现,真正的奇迹发生在两种方法协同运作之时。

Consider our recent work with a global manufacturing company. When we first started discussing AI transformation with their leadership team, they were caught in what we call the “either-or trap”—believing they needed to choose between investing in generative AI tools for their workforce or developing AI agents for process automation. What we discovered together was that the real magic happens when both approaches work in concert.

人工智能的真正力量在于其整合。生成式人工智能和人工智能代理并非相互竞争的技术,而是相辅相成的力量,共同推动着同一个愿景。以正在经历转型的人力资源部门为例:

The true power of AI lies in its integration. Generative AI and AI agents are not competing technologies but complementary forces driving a singular vision. Consider an HR department undergoing the transformation:

生成式人工智能帮助人力资源专业人员制定个性化培训计划、分析员工敬业度数据并创建有效的沟通策略。

Generative AI helps HR professionals craft personalized training plans, analyze employee engagement data, and create impactful communication strategies.

人工智能代理可以自动执行诸如处理工资、管理福利和协调日程安排等日常任务。

AI Agents automate routine tasks like processing payroll, managing benefits, and coordinating schedules.

这些技术共同使人力资源团队能够专注于最重要的事情:建立关系、促进成长和创造一个让员工蓬勃发展的环境。

Together, these technologies enable the HR team to focus on what matters most: building relationships, fostering growth, and creating an environment where employees thrive.

这种整体方法不仅改变了工作流程,也改变了思维模式。员工不再将技术视为威胁,而是视为赋能工具,它能提升员工能力,并让他们参与到能够带来价值和满足感的活动中。

This holistic approach transforms not just workflows but mindsets. Employees see technology not as a threat but as an enabler, enhancing their capabilities while allowing them to engage in activities that bring value and satisfaction.

从理论到实践:构建转型框架

From Theory to Practice: Building a Transformation Framework

根据我们的经验,要成功实施这种双管齐下的方法,组织必须采用结构化的转型框架:

From our experience, to successfully implement this dual approach, organizations must adopt a structured transformation framework:

1. 从生成式人工智能入手:鼓励员工在日常工作中尝试使用生成式人工智能工具。提供培训,激励员工采用,并创建论坛分享最佳实践。这种自下而上的策略既能激发员工对人工智能的兴趣和信任,又能立即提升生产力。

1. Start with Generative AI: Encourage employees to experiment with generative AI tools in their daily tasks. Provide training, incentivize adoption, and create forums for sharing best practices. This bottom-up strategy builds excitement and trust in AI while delivering immediate productivity gains.

2. 利用人工智能代理实现规模化:采用自上而下的方法部署人工智能代理,以实现重复性和复杂流程的自动化。首先开展试点项目,完善实施方案,然后逐步推广到各个部门。让员工参与设计与其工作流程相辅相成的代理,从而培养他们的主人翁意识。

2. Scale with AI Agents: Use a top-down approach to deploy AI agents for automating repetitive and complex processes. Start with pilot projects to refine the implementation and gradually scale across departments. Involve employees in designing agents that complement their workflows, fostering a sense of ownership.

3. 营造学习和适应文化:投资于持续教育,帮助员工将人工智能融入到工作中。重点宣传成功案例,鼓励团队进行试验、学习和创新。

3. Foster a Culture of Learning and Adaptation: Invest in continuous education to help employees integrate AI into their roles. Highlight success stories and encourage teams to experiment, learn, and innovate.

4. 将指标与战略目标保持一致:定义能够衡量效率提升和以人为本的关键绩效指标 (KPI)。评估成果,例如员工满意度、创造力和协作能力。定期审查这些指标,以确保转型始终与组织目标保持一致。

4. Align Metrics with Strategic Goals: Define KPIs that measure both efficiency gains and human-centric outcomes, such as employee satisfaction, creativity, and collaboration. Regularly review these metrics to ensure the transformation stays aligned with organizational objectives.

5. 优先考虑治理和伦理:建立治理结构,确保人工智能的合乎伦理的使用、数据隐私和透明度。创建论坛,让员工能够表达意见并参与塑造转型历程。

5. Prioritize Governance and Ethics: Establish governance structures that ensure ethical AI use, data privacy, and transparency. Create forums for employees to voice concerns and participate in shaping the transformation journey.

当智能体失控时:为人工智能系统构建必要的安全保障

When Agents Go Rogue: Building Essential Safeguards for AI Systems

早在2010年,在ChatGPT或自主代理出现之前,华尔街就经历了一场后来被称为“闪崩”的事件。短短36分钟内,自动交易算法就让近1万亿美元的市值蒸发殆尽。尽管市场最终得以复苏,但这次事件让我们提前看到了,如果我们在缺乏适当保障措施的情况下赋予人工智能代理过多的自主权,将会带来怎样的后果。

In 2010, well before anyone was talking about ChatGPT or autonomous agents, Wall Street experienced what would later be called the “Flash Crash.” In just 36 minutes, automated trading algorithms wiped out nearly $1 trillion in market value. While the market eventually recovered, this incident offered an early glimpse of what can go wrong when we give AI agents too much autonomy without proper safeguards.

这本不该发生。算法原本按照预设程序运行——根据市场情况进行买卖。但它们之间的交互却产生了一个意想不到的反馈循环,机器人之间展开了美国证券交易委员会后来称之为“烫手山芋”式的交易,以越来越剧烈的价格来回传递合约。虽然没有任何单一实体有过错,但它们共同作用,几乎导致市场崩盘。

This wasn’t supposed to happen. The algorithms were doing exactly what they were programmed to do—buy and sell based on market conditions. But their interactions with each other created an unexpected feedback loop, with bots engaging in what the Securities and Exchange Commission later described as a “hot potato” pattern of trading, passing contracts back and forth at increasingly volatile prices. No single entity was at fault, yet together, they nearly crashed the market.

随着人工智能代理变得越来越复杂和普及,我们看到类似的情况以不同的方式重演。在我们与一家大型金融机构合作,为其部署首个人工智能交易代理时,我们也犯了类似的错误。该系统旨在优化交易模式,但我们我们之前没有充分考虑该系统将如何与其他市场上的自动化交易系统交互。部署后几个小时内,该代理就启动了一系列快速交易,虽然从技术上讲这些交易是盈利的,但却引起了监管合规系统的警觉。我们不得不迅速将其关闭并重新评估我们的方案。

We’ve seen this story repeat itself in different ways as AI agents become more sophisticated and widespread. During our work with a major financial institution implementing their first AI-powered trading agents, we made a similar mistake. The system was designed to optimize trading patterns, but we hadn’t properly accounted for how it would interact with other automated trading systems in the market. Within hours of deployment, the agent had initiated a series of rapid-fire trades that, while technically profitable, raised red flags with regulatory compliance systems. We had to shut it down quickly and reassess our approach.

这些经验给我们上了至关重要的一课:对于人工智能代理而言,即使是出于好意的设计也可能导致意想不到的后果。挑战不仅在于如何让代理正确运行,更在于如何让它们在由人类和人工智能主体构成的复杂生态系统中安全且合乎伦理地运行。

These experiences taught us a crucial lesson: When it comes to AI agents, even well-intentioned designs can lead to unintended consequences. The challenge isn’t just about making agents work correctly—it’s about making them work safely and ethically within a complex ecosystem of human and artificial actors.

好经纪人变坏

When Good Agents Go Bad

想想加拿大航空在2024年初遇到的情况。他们部署了一个旨在提供帮助和信息的客服聊天机器人。该机器人可以访问加拿大航空的网站,并被指示协助客户解答疑问。听起来很简单,对吧?然而,事情却出了岔子。该机器人开始提供远超加拿大航空实际政策的丧葬优惠票价信息。当客户试图申请这些优惠时,加拿大航空试图拒绝,理由是机器人的说法不具有约束力。他们在法庭上败诉,法庭裁定该公司应对其人工智能代理的承诺负责。

Consider what happened to Air Canada in early 2024. They deployed a customer service chatbot that was designed to be helpful and informative. The bot had access to Air Canada’s website and was instructed to assist customers with their queries. Simple enough, right? However, things went sideways when the bot began providing information about bereavement fares that were far more generous than Air Canada’s actual policy. When customers tried to claim these fares, Air Canada attempted to deny them, arguing that the bot’s statements weren’t binding. They lost that argument in court, with the tribunal ruling that the company was responsible for their AI agent’s promises.

这次事件凸显了人工智能代理的一个根本性挑战:它们可以独立运行,其行为往往超出了开发者的预期,也难以控制。加拿大航空的聊天机器人并没有出现故障——它完全按照预设程序执行任务:帮助客户。但它对“帮助”的理解却导致了公司未授权的承诺。

This incident highlights a fundamental challenge with AI agents: they can operate independently in ways that their creators didn’t anticipate and can’t easily control. The Air Canada bot wasn’t malfunctioning—it was doing exactly what it was programmed to do: help customers. But its interpretation of “help” led to commitments that the company hadn’t authorized.

我们在自己的工作中也遇到过类似的问题。在帮助一家零售客户部署用于库存管理的AI代理时,我们发现该代理开始自动订购库存水平下降的产品。从理论上讲,这似乎合情合理。但实际上,这导致了大量季节性商品的超额订购,而这些商品原本是要逐步淘汰的。这位代理商并不了解季节性零售周期和库存策略的更广泛背景。

We’ve encountered similar issues in our own work. While helping a retail client implement an AI agent for inventory management, we discovered that the agent had begun automatically placing orders for products that showed declining stock levels. On paper, this seemed logical. In practice, it led to massive over-ordering of seasonal items that were intentionally being phased out. The agent didn’t understand the broader context of seasonal retail cycles and inventory strategy.

风险的新前沿

The New Frontier of Risk

如今的人工智能代理与传统自动化有何不同?答案在于我们在各行业实施过程中观察到的三个关键特征:

What makes today’s AI agents different from traditional automation? The answer lies in three key characteristics that we’ve observed through our implementations across industries:

首先,现代人工智能代理能够理解并执行高层次、通常较为模糊的目标。与遵循僵化规则的传统软件不同,这些代理可以接收诸如“提升客户满意度”之类的通用指令,并独立决定如何实现目标。这种灵活性固然强大,但也存在风险——就像加拿大航空的聊天机器人一样,它们可能会选择一些技术上能够达成目标,但却会引发其他问题的路径。

First, modern AI agents can interpret and act on high-level, often vague goals. Unlike traditional software that follows rigid rules, these agents can take a general instruction like “improve customer satisfaction” and independently determine how to achieve it. This flexibility is powerful but dangerous—like the Air Canada bot, they may choose paths that technically achieve the goal but create other problems.

其次,它们能够以前所未有的方式与世界互动。它们可以访问数据库、发送电子邮件、下订单,甚至控制物理系统。我们合作过的一家制造企业客户部署了一个人工智能代理来优化其生产线。该代理能够根据质量指标实时调整机器设置。虽然这提高了效率,但也导致偶尔出现的突然变化,令无法预测或理解代理决策的人工操作员感到困惑和沮丧。

Second, they can interact with the world in unprecedented ways. They can access databases, send emails, place orders, and even control physical systems. One manufacturing client we worked with implemented an AI agent to optimize their production line. The agent had the ability to adjust machine settings in real-time based on quality metrics. While this led to improved efficiency, it also resulted in occasional sudden changes that confused and frustrated human operators who couldn’t predict or understand the agent’s decisions.

第三,这些智能体无需直接监督即可无限期运行。它们不会下班或休息,即使最初促使它们被创建的条件发生变化,它们也能继续执行预先设定的目标。当智能体在目标和约束条件随时间演变的动态环境中运行时,这种持续性尤其会带来问题。

Third, these agents can operate indefinitely without direct supervision. They don’t clock out or take breaks, and they can continue executing their programmed objectives long after the original conditions that prompted their creation have changed. This persistence can be particularly problematic when agents are operating in dynamic environments where goals and constraints evolve over time.

构建必要的安全措施

Building Essential Safeguards

通过我们的经验和错误,我们制定了一套用于实施人工智能代理必要安全保障的框架。这些不仅仅是理论上的指导原则,而是我们通过惨痛的教训才总结出的实用原则。

Through our experiences and mistakes, we’ve developed a framework for implementing essential safeguards for AI agents. These aren’t just theoretical guidelines—they’re practical necessities that we’ve learned the hard way.

交易管理保障措施

Transaction Management Safeguards

第一道防线是健全的交易管理。这意味着对任何能够做出财务或资源承诺的代理人,都要实施明确的限额和监督机制。我们现在始终建议:

The first line of defense is robust transaction management. This means implementing clear limits and oversight mechanisms for any agent that can make financial or resource commitments. We now always recommend:

对交易规模和频率设定硬性限制

Hard limits on transaction sizes and frequencies

超过一定阈值的交易需经过多层审批

Multiple approval layers for transactions above certain thresholds

能够检测异常模式的实时监控系统

Real-time monitoring systems that can detect unusual patterns

所有代理操作均有清晰的审计跟踪记录

Clear audit trails for all agent actions

我们的一位零售客户在库存管理系统出现故障后实施了这些安全措施。现在,任何超过一定金额的订单都需要人工审批,系统还会自动标记异常的订购模式以供审核。这既避免了多起潜在的超额订购事件,又保留了自动化带来的效率优势。

One retail client we work with implemented these safeguards after their inventory management agent went rogue. Now, any order above a certain dollar amount requires human approval, and the system automatically flags unusual ordering patterns for review. This has prevented several potential over-ordering incidents while still maintaining the efficiency benefits of automation.

道德准则与合规性

Ethical Guidelines and Compliance

在智能体设计中,伦理问题绝不能被忽视。我们已经学会将伦理约束直接融入智能体的决策过程中。这包括:

Ethics can’t be an afterthought in agent design. We’ve learned to embed ethical constraints directly into agent decision-making processes. This includes:

明确界定可接受和不可接受的行为

Clear definitions of acceptable and unacceptable actions

定期对代理人的行为进行道德审计

Regular ethical audits of agent behavior

利益相关者质疑代理人决策的机制

Mechanisms for stakeholders to challenge agent decisions

代理人决策的透明度要求

Transparency requirements for agent decision-making

我们前面提到的那家金融机构现在要求其所有交易代理人定期接受道德审计,不仅要审查其是否遵守法规,还要审查其交易模式对市场稳定的更广泛影响。

The financial institution we mentioned earlier now requires all their trading agents to undergo regular ethical audits, examining not just compliance with regulations but also the broader impact of their trading patterns on market stability.

安全控制

Safety Controls

安全保障措施至关重要,尤其对于控制物理系统或做出影响人身安全的决策的机构而言更是如此。关键要素包括:

Safety safeguards are crucial, especially for agents that control physical systems or make decisions that affect human safety. Key elements include:

紧急停机程序

Emergency shutdown procedures

定期安全检查和验证

Regular safety checks and validations

冗余监控系统

Redundant monitoring systems

明确的安全监管责任链

Clear chains of responsibility for safety oversight

我们的一位制造企业客户在使用生产线优化代理程序几次险些发生事故后,实施了一套“人工干预”系统。现在,操作员如果发现潜在的安全隐患,可以立即暂停代理程序发起的任何更改。

A manufacturing client we work with implemented a “human override” system after several close calls with their production line optimization agent. Operators can now instantly pause any agent-initiated changes if they spot potential safety issues.

隐私保护

Privacy Protection

随着代理商处理越来越多敏感数据,隐私保护措施变得日益重要。必要的保护措施包括:

Privacy safeguards have become increasingly critical as agents handle more sensitive data. Essential protections include:

严格的数据访问控制

Strict data access controls

定期进行隐私影响评估

Regular privacy impact assessments

明确的数据保留和删除政策

Clear data retention and deletion policies

处理个人数据请求的机制

Mechanisms for handling personal data requests

我们曾因一次惨痛的教训而吸取经验:我们新聘的客服人员在没有采取适当隐私控制措施的情况下,开始存储详细的客户互动日志。现在,我们在设计阶段就实施隐私保护,制定明确的政策,规定可以收集哪些数据以及如何处理这些数据。

We learned this lesson the hard way when a customer service agent we implemented began storing detailed customer interaction logs without proper privacy controls. Now, we implement privacy protection at the design stage, with clear policies about what data can be collected and how it should be handled.

展望未来:特工安全的未来

Looking Ahead: The Future of Agent Safety

随着人工智能代理变得越来越复杂和普及,这些安全措施的重要性只会与日俱增。我们已经看到所谓的“代理生态系统”的出现——人工智能代理网络以日益复杂的方式彼此交互,并与人类系统进行交互。

As AI agents become more sophisticated and widespread, the importance of these safeguards will only grow. We’re already seeing the emergence of what we call “agent ecosystems”—networks of AI agents interacting with each other and with human systems in increasingly complex ways.

未来的挑战不仅在于控制单个智能体,更在于理解和管理这些生态系统。这需要全新的安全和治理方法,以应对智能体交互过程中涌现出的各种行为。我们目前正与多家客户合作开发管理这些生态系统的框架,但我们也谦逊地承认,我们都在不断学习进步。

The challenge ahead isn’t just about controlling individual agents—it’s about understanding and managing these ecosystems. This requires new approaches to safety and governance that can handle the emergent behaviors that arise from agent interactions. We’re currently working with several clients to develop frameworks for managing these ecosystems, but we’re also humble enough to acknowledge that we’re all learning as we go.

人工智能的未来既令人兴奋又令人担忧。这些技术在提升效率和创新方面蕴藏着巨大的潜力,但同时也存在着我们才刚刚开始了解的风险。我们讨论过的保障措施并非完美无缺,但它们代表了我们目前对如何驾驭人工智能的强大功能并防范其潜在危险的最佳理解。

The future of AI agents is both exciting and daunting. While these technologies offer tremendous potential for improving efficiency and innovation, they also present risks that we’re only beginning to understand. The safeguards we’ve discussed aren’t perfect, but they represent our best current understanding of how to harness the power of AI agents while protecting against their potential dangers.

关键在于保持警惕和灵活应变,从每一次新的挑战中吸取经验教训,并不断改进我们的安全保障措施。正如我们经常告诉客户的那样,我们的目标并非消除所有风险——这是不可能的。相反,我们的目标是创建能够安全运行并从错误中吸取教训的系统,就像我们自身一样。

The key is to remain vigilant and adaptive, learning from each new challenge and continuously improving our safeguards. As we often tell our clients, the goal isn’t to eliminate all risks—that’s impossible. Instead, we aim to create systems that can fail safely and learn from their mistakes, just as we do.

第十二章

CHAPTER 12

跨行业代理案例研究及应用案例

CASE STUDY AND USE CASES OF AGENTS ACROSS INDUSTRIES

一个在对人工智能代理实际应用探索的尾声,本章将我们从两个重要角度总结了关于变革管理、规模化和执行的所有经验。首先,我们将深入剖析宠物之家(Pets at Home)的转型历程,它生动地展现了企业如何在规模化部署人工智能代理的同时,保持其至关重要的人性化服务。其次,我们将考察跨行业、跨职能的众多应用案例,这些案例充分展现了人工智能代理在实际应用中的多功能性和影响力。这些案例不仅是成功案例,更是您自身转型之旅的蓝图,为您提供切实可行的见解和行之有效的方法,您可以根据自身组织的具体情况进行调整。

As we conclude our exploration of practical AI agent implementation, this chapter brings together everything we’ve learned about change management, scaling, and execution through two powerful lenses. First, we dive deep into Pets at Home’s transformative journey, which exemplifies how organizations can successfully implement AI agents at scale while maintaining their essential human touch. Then, we examine a diverse collection of use cases across industries and functions that demonstrate the versatility and impact of AI agents in real-world settings. These examples aren’t just success stories—they’re blueprints for your own transformation journey, offering practical insights and proven approaches that you can adapt for your organization.

案例研究:引领企业人工智能代理转型:Pets at Home

Case Study: Pioneering Enterprise AI Agent Transformation: Pets at Home

在企业人工智能转型日新月异的今天,我们尤其兴奋地与大家分享 Pets at Home 的故事。我们认为 Pets at Home 是全球大规模应用智能体人工智能的先驱之一。他们的成就令人瞩目——从准确率高达 99.6% 的兽医咨询转录员,到彻底革新零售业务欺诈检测的自主智能体,无不体现着他们的卓越贡献。在人工智能转型与企业架构主管 Simon Ellis 的远见卓识的领导下,Pets at Home 开启了史无前例的转型之旅,为企业如何利用人工智能体树立了新的标杆。

In the rapidly evolving landscape of enterprise AI transformation, we’re particularly excited to share the story of Pets at Home, which we consider one of the global pioneers in implementing agentic AI at scale. Their achievements speak volumes—from an ambient digital scribe that transcribes veterinary consultations with 99.6% accuracy to autonomous agents that have revolutionized fraud detection across their retail operations. Under the visionary leadership of Simon Ellis, their Head of AI Transformation and Enterprise Architecture, the company has embarked on a first-of-its-kind transformation that is setting new standards for how enterprises can leverage AI agents.

作为英国最大的宠物护理公司,Pets at Home 拥有约 450 家零售店、450 家兽医诊所,以及每周为 17,000 只宠物提供服务的综合美容服务,其发展历程为企业级 AI 代理系统的实际应用提供了宝贵的见解。

As the UK’s largest pet care company, with approximately 450 retail stores, 450 veterinary practices, and a comprehensive grooming service that handles 17,000 pets weekly, Pets at Home’s journey offers valuable insights into the practical implementation of enterprise-wide AI agent systems.

挑战:统一复杂的企业

The Challenge: Unifying a Complex Enterprise

埃利斯加入Pets at Home时,面临着许多大型企业都会遇到的挑战:运营孤岛。公司各个业务部门——零售门店、兽医诊所、美容服务和线上业务——各自独立运作,导致顾客和员工的体验都缺乏连贯性。Pets at Home的宠物俱乐部拥有超过800万会员,并掌握着1000万只宠物的数据,埃利斯意识到,公司拥有巨大的潜力,可以更有效地利用这些信息,实现组织内部的协同运作。

When Ellis joined Pets at Home, he faced a challenge common to many large enterprises: operational silos. The company’s various business units—retail stores, veterinary practices, grooming services, and online operations—operated independently, creating disconnected experiences for both customers and employees. With over 8 million customers in their Pets Club program and data on 10 million pets, Ellis recognized an enormous opportunity to leverage this information more effectively across their organization.

战略愿景:超越简单的自动化

Strategic Vision: Beyond Simple Automation

Pets at Home转型与众不同之处在于其对跨领域人工智能代理的大胆愿景。他们不再仅仅关注人工智能的实现,而是着眼于人工智能的部署。作为一系列独立的自动化项目,他们设想未来人工智能代理将成为公司与其利益相关者之间的主要接口¹⁹³。埃利斯描述了他们雄心勃勃的愿景:“我们的愿景是为每一位客户创建一个人工智能数字助理……如果帕斯卡来到宠物之家,无论您是想查找产品信息、在线查询宠物的症状(因为您的宠物生病了)、查看订单状态,还是管理订阅——实际上,帕斯卡都会拥有一个了解您和您的宠物的数字助理。”

What sets Pets at Home’s transformation apart is its bold vision for transversal AI agents. Rather than viewing AI implementation as a series of isolated automation projects, they envision a future where AI agents become the primary interface between the company and its stakeholders193. Ellis describes their ambitious vision: “Our vision is to create an AI digital assistant for each of our customers... If Pascal comes to Pets at Home, it doesn’t matter whether you want to research a product, do some online symptom checking because your pet’s not well, check where your order is, or manage your subscription—effectively, Pascal will have a digital assistant that knows you and knows your pets.”

埃利斯进一步解释了这一愿景如何改变传统的数字互动:“这些智能客服,这些个性化智能客服,将在未来三到五年内真正改变零售市场。我认为MDH零售公司应该对此感到担忧,因为消费者以后可能不会再访问网站了。” 这标志着企业与客户互动方式的根本性转变,从传统渠道转向以个性化人工智能客服作为主要接触点。

Ellis further explains how this vision transforms traditional digital interactions: “These agents, these personalized agents, are the things that are actually going to transform the retail market in the next 3 or 4 or 5 years. It’s actually something I think MDH retail needs to be worried about because you won’t go to a website anymore.” This represents a fundamental shift in how companies interact with customers, moving from traditional channels to personalized AI agents as the primary touchpoint.

实施策略:从小处着手,放眼未来

Implementation Strategy: Starting Small but Thinking Big

Pets at Home 的实施策略体现了我们观察到的对企业成功进行人工智能转型至关重要的几个关键原则:

Pets at Home’s implementation strategy demonstrates several key principles that we’ve observed as crucial for successful enterprise AI transformations:

1. 高层支持与战略协同:转型始于高层的强力支持,包括董事长在内,他将人工智能视为“下一个时代”。工业革命。”事实证明,这种高管支持对于推动整个组织的变革至关重要。他们没有成立大型委员会来寻找机会,而是专注于由充满变革热情的高管发起的小型项目。

1. Executive Sponsorship and Strategic Alignment: The transformation began with strong support from the top, including the chairman, who viewed AI as “the next industrial revolution.” This executive backing proved crucial for driving adoption across the organization. Instead of creating large committees to identify opportunities, they focused on smaller initiatives with executive sponsors who were passionate about driving change.

2. 基础先行:在深入研究高级人工智能实施方案之前,该公司投资在 Azure 上构建了一个强大的数据基础架构。该基础架构整合了来自各个业务部门的数据,为人工智能项目奠定了坚实的基础。他们认识到,结构化和非结构化数据都需要妥善组织,人工智能代理才能高效运行。

2. Foundation First: Before diving into advanced AI implementations, the company invested in building a robust data foundation in Azure. This infrastructure unified data from various operations, creating a solid base for AI initiatives. They recognized that both structured and unstructured data needed to be well-organized for AI agents to function effectively.

3. 通过有针对性的试点项目验证价值:在统一的客户体验背后,埃利斯设想了一个由专业代理组成的协同工作网络:“幕后可能有很多代理——可能有兽医代理、客服代理,也可能是零售销售代理。” 同样,对于员工,“我们也在构建同事助手”,根据他们的职位和专业知识提供个性化支持。例如,“如果我在店里工作,而且是水族专家,系统会说‘嗨,西蒙,我知道你是水族专家,我知道你是谁,这是所有最新的操作流程。’”

3. Proving Value Through Targeted Pilots: Behind this unified customer experience, Ellis envisions a network of specialized agents working together: “Behind the scenes, there could be a whole bunch of agents—there could be a veterinary agent, a customer service agent, that could be a retail sales agent.” Similarly, for employees, “we’re also building out a colleague assistant” that provides personalized support based on their role and expertise. For instance, “if I worked in the store and I was a specialist in aquarium, it will go ‘Hey Simon, I know you’re a specialist in aquarium, I know who you are, here’s all the latest operating procedures.’”

他们的第一个重大应用是为兽医诊所开发的一款环境式数字记录系统,该系统能够自动转录问诊内容并生成临床记录。这项试点项目通过规范文档流程、提高效率,同时保留对关键医疗决策的人工监督,展现了其立竿见影的价值。

Their first major implementation was an ambient digital scribe for veterinary practices, which automatically transcribes consultations and creates clinical notes. This pilot demonstrated immediate value by standardizing documentation and improving efficiency, while maintaining human oversight for critical healthcare decisions.

通过低代码/无代码平台实现规模化

Scaling Through Low-Code/No-Code Platforms

Pets at Home转型中最具创新性的方面之一——也是与我们在该领域的经验尤为契合的——是他们扩展AI代理开发规模的方法。他们没有仅仅依赖传统的软件开发方式,而是利用微软的Copilot Studio实现了代理创建的普及化。Ellis热情洋溢地描述了他如何使用低代码工具在一个上午构建两个原型代理,充分展现了快速部署和实验的潜力。

One of the most innovative aspects of Pets at Home’s transformation—and one that particularly resonates with our experience in the field—is their approach to scaling AI agent development. Rather than relying solely on traditional software development, they leveraged Microsoft’s Copilot Studio to democratize agent creation. Ellis’s enthusiasm is contagious as he describes building two prototype agents in a single morning using low-code tools, demonstrating the potential for rapid deployment and experimentation.

这种方法解决了企业人工智能转型面临的主要挑战之一:开发的可扩展性。通过让领域专家能够通过低代码界面创建和修改智能体,Pets at Home 可以在保持质量和一致性的同时加速转型。

This approach addresses one of the primary challenges in enterprise AI transformation: the scalability of development. By enabling subject matter experts to create and modify agents through low-code interfaces, Pets at Home can accelerate their transformation while maintaining quality and consistency.

主要用例和结果

Key Use Cases and Results

让我们一起来探索一下我们在 Pets at Home 看到的一些最引人入胜的应用案例,人工智能代理正在日常运营中发挥着真正的作用。

Let’s explore some of the most fascinating implementations we’ve seen at Pets at Home, where AI agents are making a real difference in day-to-day operations.

欺诈检测代理

Fraud Detection Agent

这款用于欺诈检测的自主代理程序完美地展现了人工智能在应对复杂商业挑战方面的变革性力量。高级欺诈经理凯·伯克比分享了一个极具启发性的例子:该代理程序能够识别出同一张破损包裹的照片被不同的人多次用于申请退款的情况——这种模式几乎不可能由人工手动识别。更令人兴奋的是,该系统的功能远不止于简单的欺诈检测;当出现多个真实投诉时,它还能帮助识别合法的产品问题,从而将欺诈检测转变为提升产品质量的宝贵工具。

The autonomous agent for fraud detection beautifully illustrates the transformative power of AI in tackling complex business challenges. Kay Birkby, the senior fraud manager, shared an illuminating example: the agent can spot when the same photograph of a damaged package is used multiple times by different people attempting to claim refunds—a pattern that would be nearly impossible for humans to detect manually. What’s particularly exciting is how the system goes beyond simple fraud detection; it also helps identify legitimate product issues when multiple genuine complaints arise, transforming fraud detection into a valuable tool for quality improvement.

临床文档助理

Clinical Documentation Assistant

我们对这款环境数字转录系统彻底革新兽医诊疗流程的方式印象尤为深刻。该系统使用标准电脑麦克风,即使在有背景噪音的情况下,也能实现高达 99.6% 的转录准确率——这一成就远超团队最初的预期。但真正让我们兴奋的是:除了简单的转录功能外,它还能规范不同诊所的医疗编码和文档记录,从而在以往可能存在个体差异的临床医生编码方式上实现一致性。这种标准化带来了一个意想不到的好处:它生成了更高质量的结构化数据,可用于机器学习和预测模型,从而形成良性循环,不断提升患者护理水平。

We’re particularly impressed by how the ambient digital scribe has revolutionized veterinary practice operations. The system achieves a remarkable 99.6% accuracy in transcription using standard PC microphones, even with background noise—a feat that exceeded the team’s initial expectations. But here’s what really gets us excited: beyond simple transcription, it standardizes medical coding and documentation across practices, creating consistency where individual clinicians might previously have coded things differently. This standardization has led to an unexpected bonus: it generates higher-quality structured data that can be used for machine learning and predictive models, creating a virtuous cycle of improvement in patient care.

保险整合代理

Insurance Integration Agent

我们认为这是一个绝佳的实用人工智能应用案例:该公司正在开发一款智能代理,可在兽医咨询期间与宠物保险单无缝集成。该代理会在兽医与宠物主人讨论治疗方案时自动查询保单承保范围,从而解决兽医和宠物主人常常不清楚哪些治疗项目在承保范围内的痛点。这种实时信息有助于医护人员和宠物主人就治疗方案做出更明智的决定。

In what we see as a brilliant example of a practical AI application, the company is developing an agent that seamlessly integrates with pet insurance policies during veterinary consultations. This agent will automatically check policy coverage while the veterinarian discusses treatment options with pet owners, addressing a common pain point where neither vets nor pet owners are always clear about what treatments are covered. This real-time information helps both practitioners and pet owners make more informed decisions about care options.

门店同事助理

Store Colleague Assistant

其中最令人振奋的应用之一是他们为零售员工开发的个性化人工智能助手,该助手能够根据员工的具体岗位和专业知识进行调整。试想一下,如果您是一位专营水族箱的店员,而您的人工智能助手了解您的专长,并主动提供相关的操作流程和产品信息,那该是多么便捷。这套系统不仅提高了培训效率,还能帮助员工提供更加专业的客户服务。

One of the most inspiring implementations is their personalized AI assistant for retail staff that adapts to specific roles and expertise. Imagine being a store colleague specializing in aquariums and having an AI assistant that knows your specialty and proactively provides relevant operating procedures and product information. This system not only improves training efficiency but enables staff to provide exceptionally informed customer service.

经验教训和最佳实践

Lessons Learned and Best Practices

通过对 Pets at Home 的发展历程的分析,我们发现了一些有趣的见解,我们认为这些见解对于任何着手进行 AI 代理转型的组织来说都至关重要。

Through our analysis of Pets at Home’s journey, we’ve uncovered some fascinating insights that we believe are crucial for any organization embarking on an AI agent transformation.

数据质量的关键作用

The Critical Role of Data Quality

Ellis 的团队发现了一个我们认为特别引人注目的结论:人工智能代理的成功很大程度上取决于结构化和非结构化数据的质量。他们通过经验了解到,虽然人类员工可以轻松处理不同文档或政策之间的冲突,但人工智能代理却难以应对这些不一致之处。Ellis 分享了一个富有启发性的观察:“如果你通过 Copilot 向 LLM 提供一个 SharePoint 目录并向它提出一个问题,它就能找到相关的知识;但如果是两个相互矛盾的信息,它就会……要么选择第一个,要么选择第二个,要么两者结合。” 这一洞见之所以如此深刻,是因为它挑战了我们以往对结构化数据的关注,并促使我们从更全面的角度思考企业知识管理。

Ellis’s team made a discovery that we find particularly compelling: success with AI agents depends heavily on the quality of both structured and unstructured data. They learned through experience that while human employees could easily handle conflicts between different documents or policies, AI agents struggled with these inconsistencies. Ellis shares an enlightening observation: “If you provide a SharePoint directory via co-pilot to LLM and ask it a question, it will find the knowledge, but if it’s two contradictory bits, it’ll kind of go... I’ll take the first one, or I’ll take the second one, or I’ll take a mix of two.” What makes this insight so powerful is how it challenges our traditional focus on structured data and pushes us to think more holistically about knowledge management across the enterprise.

高管赞助的力量

The Power of Executive Sponsorship

Pets at Home转型中最令人振奋的方面之一,是其高管团队给予的强大支持。董事长将人工智能视为“下一场工业革命”,这一愿景不仅奠定了基调,更在整个公司范围内激发了行动。他们策略的巧妙之处在于,他们没有组建庞大的委员会,而是专注于由充满热情的高管发起的小型项目。这种精准的策略使他们能够快速行动并展现价值,从而为更大规模的转型积蓄了势不可挡的动力。我们屡次见证过这种策略的成功,但很少有像Pets at Home这样目标明确、执行得如此透彻的案例。

One of the most inspiring aspects of Pets at Home’s transformation is its exceptionally strong executive support. Their chairman’s vision of AI as “the next industrial revolution” didn’t just set the tone—it catalyzed action across the organization. What’s particularly clever about their approach is how they focused on smaller initiatives with passionate executive sponsors rather than creating large committees. This targeted approach allowed them to move quickly and demonstrate value, building unstoppable momentum for larger transformations. It’s a strategy we’ve seen work time and again, but rarely executed with such clarity of purpose.

成本管理的演变

The Evolution of Cost Management

他们遇到的最引人入胜的挑战之一——也是我们在整个行业中普遍面临的挑战——是如何有效地管理人工智能成本。Ellis 指出,由于更高的令牌使用量和处理需求,运行智能体人工智能的成本可能高于标准自动化。每个智能体本质上都是定制的,因此传统的定价模式并不适用。我们尤其欣喜地看到,他们正与供应商合作,开创新的企业定价框架,以更好地反映人工智能智能体部署的独特性——这项工作很可能惠及整个行业。

One of the most intriguing challenges they encountered—and one we’re seeing across the industry—revolves around managing AI costs effectively. Ellis makes a fascinating point about how running agentic AI can be more expensive than standard automation due to higher token usage and processing requirements. Each agent is essentially bespoke, making traditional pricing models inadequate. We’re particularly excited to see how they’re working with vendors to pioneer new enterprise pricing frameworks that better reflect the unique nature of AI agent deployments—work that will likely benefit the entire industry.

平衡自动化与人工监督

Balancing Automation with Human Oversight

他们巧妙地平衡了自动化和人工监督,堪称人工智能应用的典范。在兽医领域,他们确保所有人工智能生成的内容都经过人工监督;而在零售运营中,他们则允许更多自动化。正如埃利斯精辟地指出:“只要我们对答案的准确性和相关性有所顾虑,就必须让人类参与其中。”这种对何时自动化、何时增强人类能力的深刻理解,展现了深入运用人工智能技术所带来的智慧。

Their nuanced approach to balancing automation and human oversight is a masterclass in thoughtful AI implementation. In veterinary care, they ensure human oversight of all AI-generated content, while in retail operations, they allow for more automation. As Ellis astutely puts it, “Wherever we have a worry about the accuracy and relevance of the answer, the human needs to still be in the loop.” This sophisticated understanding of when to automate and when to augment human capabilities showcases the kind of wisdom that comes from deep engagement with AI technology.

低代码开发的影响

The Impact of Low-Code Development

他们转型过程中最令人兴奋的方面之一,或许就是他们使用微软 Copilot Studio 的经验。该经验揭示了低代码平台在加速 AI 代理部署方面具有颠覆性的巨大潜力。然而,这种开发的便捷性也带来了围绕治理和财务控制的诸多新挑战。Ellis 的观察与我们不谋而合:这些平台“部署和使用如此简单……成本也如此容易飙升”。他们创新性的应对之策是开发……治理框架既要保持民主化人工智能代理开发的优势,又要控制成本和质量——我们认为,随着这些技术的普及,这种平衡将变得越来越重要。

Perhaps one of the most exciting aspects of their transformation is their experience with Microsoft’s Copilot Studio, which revealed the game-changing potential of low-code platforms to accelerate AI agent deployment. However, this very ease of development created fascinating new challenges around governance and financial control. Ellis’s observation really resonates with us: these platforms are “so easy to deploy and use... so easy to rack up the costs.” Their innovative response was to develop governance frameworks that maintain the benefits of democratized AI agent development while keeping costs and quality under control—a balancing act that we believe will become increasingly critical as these technologies proliferate.

宠物居家护理的下一波创新浪潮

The Next Wave of Innovation for Pets at Home

当我们探索 Pets at Home 的突破性转型时,我们不禁被他们对未来的愿景所鼓舞。Ellis 的团队并非仅仅在应用现有技术,他们正在积极塑造未来几年企业人工智能的发展方向。他们与微软位于爱尔兰和西雅图的开发团队直接合作,以前所未有的方式拓展了企业人工智能代理的边界。

As we explore Pets at Home’s groundbreaking transformation, we can’t help but be energized by their vision of the future. Ellis’s team isn’t just implementing today’s technology—they’re actively shaping what enterprise AI will look like in the years to come. Working directly with Microsoft’s development teams in Ireland and Seattle, they’re pushing the boundaries of what’s possible with enterprise AI agents in ways we’ve never seen before.

Pets at Home 的前瞻性举措最令我们兴奋的是 Ellis 团队为来年制定的三项变革性发展计划。让我们逐一了解:

What truly excites us about Pets at Home’s forward-looking initiatives are three transformative developments that Ellis’s team has identified for the coming year. Let’s explore each one:

记忆演化

Memory Evolution

该团队正在攻克人工智能代理最具挑战性的方面之一——记忆能力。这不仅仅是存储信息的问题,而是要创建能够真正记住并从每一次互动中学习的人工智能助手。虽然埃利斯承认目前使用 Cosmos DB 等工具的解决方案只是权宜之计,但他的团队正在开拓新的方法,以实现更精细的个性化协助。这项进展对于打造真正人性化的深度个性化人工智能体验至关重要。

The team is tackling one of the most challenging aspects of AI agents—memory capabilities. This isn’t just about storing information; it’s about creating AI assistants that truly remember and learn from every interaction. While Ellis acknowledges that current solutions using tools like Cosmos DB are temporary, his team is pioneering approaches that will enable more sophisticated personalized assistance. This advancement is crucial for creating those deeply personalized AI experiences that feel genuinely human.

多智能体自主性

Multi-Agent Autonomy

他们工作中最具突破性的方面或许在于多智能体自主交互。他们已经在测试智能体之间的通信,无需人工干预,但同时又能保持治理和控制。埃利斯解释说:“我们目前正在与他们合作开发多智能体原型……我们开始测试智能体之间的交互,整个过程无需人工干预。” 尤其值得一提的是,他们是在低代码环境下实现的,这使得更广泛的组织都能使用这项技术。

Perhaps the most groundbreaking aspect of their work is in multi-agent autonomous interactions. They’re already testing agent-to-agent communications with no human in the loop, but doing it in a way that maintains governance and control. As Ellis explains, “We’re working with them on their multi-agent prototypes now... we’re starting to test agent-to-agent with no human in the loop.” What makes this particularly remarkable is they’re achieving this in a low-code environment, making it accessible to a broader range of organizations.

边缘人工智能和个人机器人

Edge AI and Personal Robotics

该团队预期,结合运行在设备上的小型语言模型,个人机器人和人机交互机器人领域将取得重大突破。这种组合有望从根本上改变我们在物理空间中与人工智能的交互方式。该方法使人工智能更贴近交互点,同时保持企业级的安全性和控制力。

The team anticipates a major breakthrough in personal robotics and human-edge robotics, combined with small language models running on devices. This combination could fundamentally change how we interact with AI in physical spaces. It’s an approach that brings AI closer to the point of interaction while maintaining enterprise-level security and control.

零售业的变革永续

Transforming Retail Forever

真正激发我们想象力的是埃利斯对零售业未来的大胆设想。他预言消费者与企业互动方式将发生根本性转变。“你以后不会再访问网站了,”他斩钉截铁地说。取而代之的是,顾客只需告诉他们的AI助手他们需要什么:“嘿,Gemini,去帮我买点东西。”

What truly captures our imagination is Ellis’s provocative vision for retail’s future. He predicts nothing less than a fundamental shift in how consumers interact with businesses. “You won’t go to a website anymore,” he states with conviction. Instead, customers will simply tell their AI assistants what they need: “Hey Gemini, go and buy me some stuff.”

这并非空穴来风——他们已经在构建支持这一愿景的基础设施。埃利斯的团队已在其组织内确定了超过140个人工智能代理的应用案例,展现了其巨大的变革潜力。尤其令人着迷的是他们如何推进这一规模化应用,他们不仅关注降低成本,更着眼于创造收入。正如埃利斯所指出的,“如果你能专注于那些更有利于提升收入的机会,那么收入增长是没有上限的……而成本效益始终是有限的。”

This isn’t just speculation—they’re already building the infrastructure to support this vision. Ellis’s team has identified over 140 use cases for AI agents across their organization, demonstrating the massive potential for transformation. What’s particularly fascinating is how they’re approaching this scale-up, focusing not just on cost reduction but on revenue generation. As Ellis points out, “If you can focus on opportunities that are more about improving revenue that’s uncapped... Cost efficiency is always capped.”

规模的挑战

The Challenge of Scale

他们发展历程中最引人入胜的方面之一,是他们如何应对在企业范围内扩展人工智能代理所面临的挑战。该团队开发了创新的方法来处理以下关键问题:

One of the most intriguing aspects of their journey is how they’re tackling the challenges of scaling AI agents across the enterprise. The team has developed innovative approaches to handling critical aspects like:

知识库间的数据一致性

Data consistency across knowledge bases

与现有系统的集成

Integration with existing systems

在快速变化的价格环境中进行成本管理

Cost management in a rapidly evolving pricing landscape

民主化人工智能发展的治理

Governance of democratized AI development

展望未来

Looking to the Future

埃利斯将这种变化的速度描述为遵循“双指数曲线”,发展速度前所未有。“每3个月、每6个月,变化就翻一番。这简直是摩尔定律的加强版,”他充满热情地说道。这种快速发展意味着组织需要做好持续变革的准备。

Ellis describes the pace of change as following a “double exponential curve,” with developments happening at an unprecedented rate. “Every 3 months every 6 months, it’s doubling. It’s Moore’s law on steroids,” he shares with infectious enthusiasm. This rapid evolution means organizations need to be prepared for continuous transformation.

Pets at Home 的故事之所以如此引人入胜,不仅仅在于其卓越的技术成就——尽管这些成就的确令人瞩目。更重要的是,他们能够在不断拓展人工智能应用边界的同时,始终坚持以人为本的理念。他们向我们展现了一个未来:科技不会取代人与人之间的互动,而是以我们目前还无法想象的方式,增强人与人之间的互动。

What makes Pets at Home’s story so compelling isn’t just their technical achievements—though those are remarkable. It’s their ability to maintain a human-centric focus while pushing the boundaries of what’s possible with AI. They’re showing us a future where technology doesn’t replace human interaction but enhances it in ways we’re only beginning to imagine.

埃利斯完美地捕捉到了当下科技时代的激动人心:“生活在科技领域,从事科技工作,真是令人兴奋!” 看看 Pets at Home 所取得的成就,我们对此深表赞同。他们从传统零售商转型为人工智能先驱的历程,为任何希望在人工智能驱动的未来蓬勃发展的企业提供了宝贵的经验。他们向我们证明,只要拥有正确的愿景、领导力和实施方法,企业人工智能的未来不仅充满希望,而且已经到来。

Ellis perfectly captures the excitement of this technological moment: “What an exciting time to live and be alive and work in technology.” Looking at what Pets at Home has achieved, we couldn’t agree more. Their journey from traditional retailer to AI pioneer offers invaluable lessons for any organization looking to thrive in this AI-driven future. They’ve shown us that with the right vision, leadership, and approach to implementation, the future of enterprise AI isn’t just promising—it’s already here.

跨职能和行业的智能体应用案例

Agentic Use Cases Across Functions and Industries

Pets at Home 的案例研究有力地展示了企业如何成功地在其运营中实施智能体人工智能。虽然他们的历程独一无二,但它揭示了一些适用于各行各业和各个职能部门的普遍原则。与 Pets at Home 一样,每个企业都需要首先确定与其战略目标和运营实际情况相符的合适用例。

The Pets at Home case study powerfully illustrates how an organization can successfully implement agentic AI across its operations. While their journey is unique, it demonstrates universal principles that apply across industries and functions. Like Pets at Home, every organization needs to start by identifying the right use cases that align with their strategic objectives and operational realities.

正如我们在第 8 章中所讨论的,识别并优先考虑合适的用例对于成功实施智能体人工智能至关重要。虽然 Pets at Home 确定了 140 多个与其业务相关的用例,但我们也收集了一套涵盖各个行业和职能的成熟用例,以帮助您快速开启自己的转型之旅。这些用例详见附录,并分为两大类:

As we discussed in Chapter 8, identifying and prioritizing the right use cases is crucial for successful agentic AI implementation. While Pets at Home identified over 140 use cases specific to their business, we have collected a comprehensive set of proven use cases across industries and functions to help you jumpstart your own transformation journey. These are detailed in the appendices, organized into two main categories:

企业级人工智能代理应用(附录 A)涵盖了六个关键领域的十五个经过验证的实现方案:

Enterprise AI Agent Applications (Appendix A) covers fifteen proven implementations across six key areas:

运营与供应链,包括生产协调和供应商沟通

Operations & Supply Chain, including manufacturing coordination and supplier communications

销售与收入管理,展示复杂的B2B销售流程编排

Sales & Revenue Management, showcasing complex B2B sales orchestration

客户体验与服务,包括医疗保健服务导航和银行服务协调

Customer Experience & Service, featuring healthcare access navigation and banking service coordination

风险、合规与安全,包括金融欺诈检测和监管文件

Risk, Compliance & Security, including financial fraud detection and regulatory documentation

知识工作与分析,涵盖竞争情报和市场研究

Knowledge Work & Analytics, covering competitive intelligence and market research

员工及行政服务,展示人力资源运营和IT服务管理

Employee & Administrative Services, demonstrating HR operations and IT service management

个人效率应用(附录 B)介绍了五种基本实现方式,这些方式通常非常出色。为开始智能AI之旅的组织提供以下起点:电子邮件管理、日历优化和研究综合。

Personal Productivity Applications (Appendix B) presents five fundamental implementations that often serve as excellent starting points for organizations beginning their agentic AI journey, including email management, calendar optimization, and research synthesis.

我们鼓励您详细研究附录中的这些用例,并将它们作为您自身实施的灵感和实用蓝图。就像 Pets at Home 一样,您可能会发现,从重点突出、影响深远的用例入手,能够为更广泛的转型积蓄力量。请记住,成功的实施并非在于部署尽可能多的用例,而在于找到那些能够为您的组织创造最大价值,同时又能提升您进行更大规模转型能力的用例。

We encourage you to explore these use cases in detail in the appendices, using them as inspiration and practical blueprints for your own implementation. Like Pets at Home, you may find that starting with focused, high-impact use cases builds momentum for broader transformation. Remember, successful implementation isn’t about deploying as many use cases as possible, but rather about identifying the ones that will create the most value for your organization while building your capabilities for larger-scale transformation.

***

***

本章的案例研究和应用案例表明,人工智能代理的实施不仅仅关乎技术,更关乎工作方式和价值创造方式的重塑。从宠物之家(Pets at Home)雄心勃勃的人工智能驱动型客户互动愿景,到我们探讨的众多行业特定应用,我们看到人工智能代理正在以切实可衡量的方式改变着企业。

The case studies and use cases presented in this chapter demonstrate that AI agent implementation isn’t just about technology—it’s about reimagining how work gets done and how value is created. From Pets at Home’s ambitious vision of AI-driven customer interactions to the numerous industry-specific applications we’ve explored, we see how AI agents are already transforming businesses in tangible, measurable ways.

然而,这些应用也引发了关于未来工作和社会的深刻思考。在第五部分,我们将详细探讨这些更广泛的影响。日益复杂的人工智能代理将如何重塑工作的本质?随着这些技术的演进,又将涌现哪些新的机遇和挑战?这对人类劳动者、组织乃至整个社会意味着什么?这些问题并非纸上谈兵,而是任何着手进行人工智能代理转型之旅的组织都必须认真考虑的关键问题。

Yet these implementations also raise profound questions about the future of work and society. As we move into Part 5, we’ll explore these broader implications in detail. How will increasingly sophisticated AI agents reshape the nature of work itself? What new opportunities and challenges will emerge as these technologies evolve? What does this mean for human workers, organizations, and society as a whole? These questions aren’t just theoretical—they’re crucial considerations for any organization embarking on an AI agent transformation journey.

第五部分

PART 5

工作与社会的未来展望

FUTURE HORIZONS FOR WORK AND SOCIETY

 

 

一个在我们共同探索人工智能代理实施和扩展的实际应用过程中,一个更大的问题逐渐成形:当这项技术普及之后会发生什么?在探讨了如何构建人工智能代理(第三部分)以及如何利用它们改造组织(第四部分)之后,现在是时候放眼未来,思考这项技术对我们的工作、学习和生活方式的深远影响了。

As we’ve journeyed together through the practical realities of implementing and scaling AI agents, a larger question has been taking shape: What happens when this technology becomes ubiquitous? Having explored how to build AI agents (Part 3) and transform organizations with them (Part 4), it’s time to lift our gaze to the horizon and consider the profound implications for how we work, learn, and live.

这并非空穴来风。在我们实施智能体人工智能的过程中,我们亲眼目睹了这些技术如何重塑流程乃至人们的生活——有时甚至以出乎所有人意料的方式。最初以自动化项目起步的项目,往往最终演变为对工作本身的根本性重塑。我们所讨论的变革并非遥不可及,它们已经在世界各地的组织中初露端倪。

This isn’t idle speculation. Throughout our experience implementing agentic AI, we’ve witnessed firsthand how these technologies reshape not just processes but people’s lives—sometimes in ways nobody anticipated. What began as automation projects often evolved into a fundamental reimagining of work itself. The changes we’re discussing aren’t decades away; they’re already emerging in organizations around the world.

第五部分的目标并非准确预测未来——没有人能够做到这一点。相反,我们旨在帮助您系统地思考摆在我们面前的各种选择,无论这些选择是个人层面的还是集体层面的。人工智能时代,工作和社会的未来并非预先注定;它将由我们今天就如何开发、实施和管理这些技术所做的决策所塑造。

Our goal in Part 5 isn’t to predict the future with certainty—no one can do that. Instead, we aim to help you think systematically about the choices before us, both individually and collectively. The future of work and society in the age of AI agents isn’t predetermined; it will be shaped by the decisions we make today about how to develop, implement, and govern these technologies.

第十三章

CHAPTER 13

新的工作世界

THE NEW WORLD OF WORK

工作重塑:人与机器的交响曲

Work Reimagined: The Symphony of Human and Machine

在我们多年的咨询和研究工作中,我们见证了无数次战略会议,但我们在2025年初于一家全球房地产公司观察到的景象却令人叹为观止。高级项目经理塔拉(Tara)不仅与团队分享项目进展,她还在精心策划一场人类创造力与人工智能的巧妙协作。人工智能代理实时分析项目数据并识别风险,而塔拉独特的人类能力——我们称之为“人性化思维”(Humics)——使她能够从团队动态、客户关系以及更广泛的业务影响等角度解读这些洞察。

In our years of consulting and research, we’ve witnessed countless strategy meetings, but what we observed at a global real estate company in early 2025 was truly remarkable. Tara, a senior project manager, wasn’t just sharing updates with her team—she was orchestrating a sophisticated collaboration between human creativity and artificial intelligence. While an AI agent analyzed project data and identified risks in real-time, Tara’s uniquely human abilities—what we call “Humics”—enabled her to interpret these insights through the lens of team dynamics, client relationships, and broader business impact.

塔拉沉思道:“真正令人着迷的不是人工智能能够处理复杂的分析,而是我们人类独特能力的开发,使我们能够创造出比人类或人工智能单独所能达到的更伟大的成就。”

“What’s fascinating,” Tara reflected, “isn’t that AI can handle complex analysis—it’s how developing our distinctly human capabilities has allowed us to create something greater than either humans or AI could achieve alone.”

这一观察深深触动了我们。通过我们的工作,我们深刻地认识到:未来不仅属于人工智能,更属于人类与机器能力的强大融合。

This observation deeply resonated with us. Through our work, we’ve come to a profound realization: the future belongs not to AI alone but to this powerful symphony of human and machine capabilities.

人机协作的演变

The Evolution of Human-Agent Collaboration

根据我们的经验,我们观察到人机协作领域出现了令人瞩目的发展进程。最引人注目的是,这种演进是如何在不同的复杂程度层级中展开的。在第一层级,我们看到的是基于规则的基本自动化——这种自动化可以处理重复性任务,但需要显式编程。第二层级则带来了智能自动化,人工智能可以利用机器学习处理更复杂的场景,但仍然需要在一定的参数范围内进行。

From our experience, we’ve observed a fascinating progression in human-agent collaboration. What’s most striking to us is how this evolution has unfolded across distinct levels of sophistication. At Level 1, we saw basic rule-based automation—the kind that could handle repetitive tasks but required explicit programming. Level 2 brought intelligent automation, where AI could handle more complex scenarios using machine learning but still within confined parameters.

真正的变革始于三级智能体工作流程。这些人工智能系统能够理解上下文,进行复杂的推理,并协调复杂的流程。正是在这里,我们看到了人机协作的首批真正意义上的案例,人工智能不再仅仅是工具,而是解决问题的伙伴。

The real transformation began with Level 3 agentic workflows. These AI systems could understand context, reason with sophistication, and orchestrate complex processes. This is where we saw the first genuine examples of human-agent collaboration, where AI wasn’t just a tool but a partner in problem-solving.

我们最引人入胜的经历之一来自我们曾合作过的一家制造企业——他们的转型历程至今仍令我们兴奋不已。他们的发展历程完美地诠释了我们所理解的人机协作的自然演进。他们最初采用基础的机器人流程自动化进行库存管理(一级),随后发展到人工智能驱动的需求预测(二级),最终部署了三级智能体系统,该系统能够自主管理整个供应链,实时应对各种突发状况并优化运营。在此过程中,人力并没有消失——而是转型专注于战略决策和监督,而人工智能则负责处理复杂的运营环节。

One of our most compelling experiences came from a manufacturing company we worked with—a transformation that still excites us when we share it. Their journey perfectly illustrates what we believe is the natural evolution of human-agent collaboration. They started with basic robotic process automation for inventory management (Level 1), progressed to AI-powered demand forecasting (Level 2), and finally implemented a Level 3 agentic system that could autonomously manage their entire supply chain, adapting to disruptions and optimizing operations in real-time. The human workforce didn’t disappear—it evolved to focus on strategic decisions and oversight while the AI handled operational complexities.

思考一下不同行业是如何重新构想工作的:

Consider how work is being reimagined across different sectors:

在医疗保健领域,我们有幸与多家领先机构合作,人工智能代理负责处理日常诊断和行政任务,使医生和护士能够专注于复杂病例和医患关系。尤其令人振奋的是,随着人工智能处理医疗护理的常规工作,人性化关怀的价值反而更高了,而不是降低了。尤其是在医疗保健领域,主流理念非常明确:人工智能服务于人类。如果运用得当,人工智能代理能够创造时间和空间,让医护人员在治疗患者的过程中更多地运用人性化的关怀。这种新的工作模式也让临床医生更有成就感。

In healthcare, where we’ve had the privilege of working with several leading institutions, AI agents handle routine diagnostics and administrative tasks, freeing doctors and nurses to focus on complex cases and patient relationships. What’s particularly inspiring is how the human touch becomes more valuable, not less, as AI handles the routine aspects of medical care. Especially in healthcare, the dominant mindset is clear: AI is in service of humans. AI agents, if led well, create the time and space for the interpersonal human touch to become a bigger part of treating patients. And the new mix of work is far more satisfying to clinicians.

在金融服务领域,我们观察到一种令人瞩目的转变:人工智能负责数据分析和风险评估,而人工顾问则专注于了解客户的人生目标,并在客户做出重大财务决策时提供情感支持。最终成果远超预期——二者结合的成效远胜于任何一方单独行动所能达到的水平。

In financial services, we’ve observed a fascinating shift: AI manages data analysis and risk assessment, while human advisors focus on understanding clients’ life goals and providing emotional support during major financial decisions. The results have exceeded our expectations—the combination delivers better outcomes than either could achieve independently.

我们最喜欢的变革之一发生在创意产业,人工智能负责处理制作中的技术环节——例如色彩校正、声音增强和场景连续性跟踪——而人类则专注于更高层次的创意指导和情感叙事。虽然这种转变仍处于早期阶段,但我们已经惊叹于这种合作模式所创造出的全新艺术表达形式,这些形式在以前是无法想象的。

One of our favorite transformations has been in creative industries, where AI handles technical aspects of production—like color correction, sound enhancement, and scene continuity tracking—while humans focus on higher-level creative direction and emotional storytelling. While it’s still the early days of this transition, we’re already amazed by how this partnership is creating new forms of artistic expression that weren’t possible before.

各层级的新角色

New Roles at Each Level

在我们的咨询工作中,我们亲眼见证了我们认为职场发展中最激动人心的趋势之一:随着人工智能能力在这些方面不断进步。随着层级的提升,新的角色不断涌现,而旧的角色也在不断转型。在第一层级,我们见证了自动化专家和流程分析师的崛起。第二层级则催生了对人工智能训练师和数据质量经理的需求。但真正令我们兴奋的是,第三层级正在创造全新的工作类别。

Throughout our consulting work, we’ve had a front-row seat to what we believe is one of the most exciting developments in workplace evolution: As AI capabilities progress through these levels, new roles emerge while others transform. At Level 1, we saw the rise of automation specialists and process analysts. Level 2 brought demands for AI trainers and data quality managers. But what truly excites us is how Level 3 is creating entirely new categories of work.

最令我们着迷的是人工智能协调者作为人类团队和人工智能系统之间关键桥梁的出现。他们既了解人类的需求,也了解人工智能的能力,从而确保两者之间的有效协作。我们最成功的案例之一是为一家金融服务公司提供咨询服务。在该案例中,协调者帮助团队转变了与人工智能的协作方式,从简单的任务自动化发展到复杂的决策支持系统,这些系统增强而非取代了人类的判断。

What fascinates us most is the emergence of AI Orchestrators as crucial intermediaries between human teams and AI systems. They understand both human needs and AI capabilities, ensuring effective collaboration between the two. One of our most successful cases was at a financial services firm we advised, where orchestrators helped transform how teams worked with AI, moving from simple task automation to complex decision support systems that enhanced human judgment rather than replacing it.

当然,随着人工智能系统自主性不断增强,决策透明度和伦理界限等问题亟需认真考量。正因如此,我们对伦理官和人工智能审计员的角色尤为重视。根据我们的经验,他们在确保人工智能决策符合人类价值观和组织原则方面发挥着至关重要的作用。我们永远不会忘记曾与一家医疗机构合作,该机构专门设立了一个部门,负责监控其人工智能系统在患者护理中的决策,从而确保其决策符合伦理规范,并能提供更精准的建议和决策。这为医生提供了一个可靠且负责任的工作环境,使他们能够更专注于治疗过程中的人际互动。

Of course, as AI systems gain more autonomy, questions about decision-making transparency and ethical boundaries need careful consideration. For this reason, we’ve become particularly passionate about the role of Ethics Officers and AI Auditors. In our experience, they play a crucial role in ensuring AI decisions align with human values and organizational principles. We’ll never forget working with a healthcare provider that created an entire department dedicated to monitoring their AI systems’ decision-making in patient care, ensuring both ethical compliance and more accurate recommendations and decisions, which provided a reliable and responsible work environment for human doctors to focus more on the interpersonal side of the treatment.

未来三大能力

The Three Competencies of the Future

通过我们的研究,我们总结出了在智能体时代取得成功所需的三项关键能力。正如《不可替代:人工智能时代脱颖而出的艺术》一书中详细阐述的那样,这三项能力是:具备变革就绪、人工智能就绪和人性化能力。194关键在于理解,这些不仅仅是技能——它们是思维模式和方法,使人们能够与人工智能协同发展。

Through our work, we’ve identified what we believe are the three essential competencies needed for success in the agentic age. As detailed in the book “IRREPLACEABLE: The Art of Standing Out in the Age of Artificial Intelligence,” these are Change-Ready, AI-Ready, and Human-Ready capabilities.194 What’s crucial to understand is that these aren’t just skills—they’re mindsets and approaches that enable people to thrive alongside AI.

做好变革准备意味着在持续变革面前培养韧性和适应能力。这需要将变革视为成长的契机,而非威胁。我们曾合作过的一位高管将其描述为“培养适应的肌肉记忆”——正如运动员训练身体以做出本能反应一样,员工也需要培养接受和驾驭变革的能力。这始终是员工的一项重要素质,但在人工智能和智能体时代,它显得尤为关键。

Being Change-Ready means developing resilience and adaptability in the face of continuous transformation. It’s about viewing change not as a threat but as an opportunity for growth. One executive we worked with described it as “developing the muscle memory for adaptation”—just as athletes train their bodies to respond instinctively, workers need to build their capacity to embrace and navigate change. This has always been a useful attribute for employees, but it’s even more critical in the age of AI and agents.

人工智能就绪能力是指理解如何有效地与人工智能系统协同工作。这超越了技术知识的范畴,而是要培养对人工智能能力和局限性的直觉。员工需要学习何时依赖人工智能,何时运用人类的判断。我们曾为一家律师事务所提供咨询,该事务所的律师在学会将人工智能的数据分析能力与自身的战略思维和情商相结合后,案件结果显著改善。2016 年出版的《唯有人类才能胜任》(Only Humans Need Apply)一书将这种能力描述为“介入”——理解人工智能系统的工作原理,观察其性能,并根据需要对其进行改进

AI-Ready competency involves understanding how to work effectively with AI systems. This goes beyond technical knowledge—it’s about developing an intuitive sense of AI’s capabilities and limitations. Workers need to learn when to rely on AI and when to apply human judgment. A legal firm we advised saw dramatic improvements in case outcomes when their lawyers learned to combine AI’s data analysis capabilities with their own strategic thinking and emotional intelligence. The 2016 book Only Humans Need Apply described this competency as “stepping in”—understanding how AI systems work, observing their performance, and making them better as required.195

“人类就绪”能力侧重于培养人工智能无法真正复制的独特人类能力——我们称之为“人类特质”。这些特质包括真正的创造力(即对人类和社会有用且有意义的新颖想法,而不仅仅是对现有想法的重新组合)、批判性思维(包括伦理判断和直觉理解),以及社交真实性(基于共同价值观建立真正的人际关系的能力)。

The Human-Ready competency focuses on developing uniquely human capabilities that AI cannot authentically replicate—what we call the “Humics.” These include genuine creativity (which are novel ideas that are useful and meaningful to humans and society and thus not simply the recombination of existing ideas), critical thinking (including ethical judgment and intuitive understanding), and social authenticity (the ability to build genuine human connections based on shared values).

腐植酸的优势

The Humics Advantage

我们发现腐植酸方法最引人注目的地方——而且我们在各个行业都一致地看到了这一点——是,与可能会过时的技术技能不同,这些基本的人类能力是永恒的,并且可以作为肥沃的土壤,从中自然而然地产生新的、相关的技能:

What we find most compelling about the Humics approach—and we’ve seen this consistently across industries—is that, unlike technical skills that may become obsolete, these fundamental human abilities are timeless and serve as fertile soil from which new, relevant skills naturally emerge:

我们一次又一次地看到,真正的创造力远超人工智能对现有想法的简单重组。我们惊叹于专业人士如何构思出真正原创的概念,这些概念源于人类情感的深度和生活经验。我们曾为一家数字营销机构提供咨询,他们就是一个很好的例子。该机构的团队专注于提升自身的创造力,并开始开发全新的叙事方式,将数据洞察与情感共鸣相结合,这是人工智能本身无法实现的。

Time after time, we’ve seen how Genuine Creativity goes beyond AI’s ability to recombine existing ideas. We’ve watched in amazement as professionals conceive truly original concepts driven by human emotional depth and lived experience. A perfect example comes from a digital marketing agency we advised, where teams focused on enhancing their creative capabilities started developing entirely new approaches to storytelling that combined data insights with emotional resonance in ways AI alone couldn’t achieve.

通过我们的工作,我们逐渐认识到,批判性思维涵盖了我们做出细致判断、质疑假设以及应对复杂伦理问题的能力。提升批判性思维能力包括保持好奇心和提出问题,从而更有效地应对不确定性和复杂情况。

Through our work, we’ve come to understand that Critical Thinking encompasses our ability to make nuanced judgments, question assumptions, and navigate ethical complexities. Sharpening critical thinking includes the art of being curious and asking questions that enable you to deal with uncertainties and complex situations more effectively.

我们曾与一家金融服务公司合作,该公司提供了一个特别鼓舞人心的例子。该公司的顾问通过研讨会和案例讨论提升了批判性思维能力。因此,他们自然而然地发展出了在道德投资策略和客户整体评估方面的新能力——这些技能并非与人工智能的分析能力竞争,而是与之互补,从而能够做出更全面的决策。

A particularly inspiring example comes from a financial services firm we worked with, where advisors sharpened their critical thinking through workshops and case study discussions. As a result, they naturally developed new capabilities in ethical investment strategy and holistic client assessment—skills that didn’t compete with AI’s analytical power but complemented it, allowing for more well-rounded decision-making.

另一个令人受益匪浅的发现是,我们亲眼见证了社交真实性在促进真诚的人际关系、同理心和信任方面所发挥的强大作用。我们在一家医疗机构亲眼目睹了这一点。从业人员专注于通过患者沟通培训、角色扮演练习和导师指导项目来提升自身的社交能力。这种对人际交往的重视帮助他们发展出更强的沟通技巧和更全面的患者护理方法,从而与人工智能诊断工具和谐共处,而不是被其取代。

Another rewarding insight has been witnessing the power of social authenticity in fostering genuine human connection, empathy, and trust. We saw this in action at a healthcare provider, where practitioners focused on enhancing their social authenticity through patient communication training, role-playing exercises, and mentorship programs. This emphasis on human connection helped them develop stronger communication skills and a more holistic approach to patient care, working in harmony with AI diagnostic tools rather than being replaced by them.

从腐植酸到新技能

From Humics to New Skills

真正让我们兴奋的是,Humics 方法能够促进与不断变化的需求相契合的有机技能发展。下面我们分享一些我们最喜欢的例子:

What truly excites us about the Humics approach is how it enables organic skill development that is aligned with emerging needs. Let us share some of our favorite examples:

我们有幸与杰出的营销专家乔丹共事,他专注于培养自身的创造力和批判性思维。我们惊叹于这种基础能力如何自然而然地催生出创新营销活动设计和数据驱动型创意等新技能。乔丹并没有试图预测哪些具体的营销技能在未来会更有价值,而是通过对腐植酸的投入,使他能够随着行业的演变自然地适应和创新。

Jordan, a remarkable marketing professional we had the pleasure of working with, focused on developing his genuine creativity and critical thinking. We were amazed to watch how this foundational capability led to the natural emergence of new skills like innovative campaign design and data-driven creativity. Rather than trying to predict which specific marketing skills would be valuable in the future, Jordan’s investment in Humics allowed him to adapt and innovate naturally as the field evolved.

埃琳娜的转变尤其令我们深受启发,她也提供了一个令人信服的例子。当人工智能代理开始在她所在的全球性银行进行投资组合分析和市场预测时,埃琳娜加倍努力提升自己的人文素养,特别是社交真诚度和批判性思维能力。接下来的发展甚至超出了我们的预期:她增强的同理心使她能够深刻理解客户与金钱的情感关系以及他们更广泛的人生目标。她的批判性思维使她能够将人工智能生成的市场洞察与客户的个人情况和价值观相结合。因此,她发展出了“整体财务生活规划”和“人工智能增强的金融情商”方面的新技能——这些能力都是在她不断强化的人文素养基础上自然而然产生的。

Elena, whose transformation particularly inspired us, provides another compelling example. When AI agents began handling portfolio analysis and market predictions at her global bank, Elena doubled down on developing her Humics, particularly her social authenticity and critical thinking. What happened next exceeded even our expectations: her enhanced empathy enabled her to deeply understand clients’ emotional relationships with money and their broader life goals. Her critical thinking allowed her to synthesize AI-generated market insights with clients’ personal circumstances and values. As a result, she developed new skills in “holistic financial life planning” and “AI-enhanced emotional intelligence in finance”—capabilities that emerged organically from her strengthened Humics foundation.

我们观察到的一个相关例子是一家资产管理公司,该公司开始实施一套用于财富管理的AI代理系统——也就是俗称的“智能投顾”。在一次关于这套新系统的采访中,一位财务顾问告诉我们:“我感觉到了智能投顾的脚步声。我并不想和它竞争,而是想更多地了解‘金融心理学’——例如,如何协调已婚客户中夫妻双方往往截然不同的观点。”

A related example we observed was at an asset management company that was beginning to implement a set of AI agents for wealth management—what is popularly called “robo-advice.” In an interview about the new system, a financial advisor told us, “I’m hearing the footsteps behind me on this robo-advisor thing. Rather than trying to compete with it, I’m trying to learn more about ‘financial psychiatry’—for example, trying to reconcile the often widely varying perspectives of husbands and wives in married couple clients.”

这两个案例最值得注意的是,新技能的习得并非出于强制或预先设定。正如健康的土壤自然而然地促进各种植物的生长一样,对腐殖质的深刻认识和重视,为相关技能的涌现创造了条件,使其能够适应不断变化的环境。乔丹和埃琳娜的经历表明,投资于这些基本的人类能力,能够使专业人士在人工智能能力不断扩展的今天,依然保持其价值和竞争力。

What’s particularly noteworthy in both cases is how the development of new skills wasn’t forced or predetermined. Just as healthy soil naturally supports the growth of various plants, strong awareness of and focus on Humics create the conditions for relevant skills to emerge in response to changing circumstances. Jordan’s and Elena’s experiences demonstrate how investing in these fundamental human capabilities enables professionals to remain relevant and valuable as AI capabilities expand.

新兴技能类别

Emerging Skill Categories

我们目前观察到一些以腐植酸为基础的新兴技能类别正在涌现。以下是一些示例:

We are currently observing the emergence of new skill categories rooted in strong Humics foundations. Here are a few examples:

情感创新将真正的创造力与社会真实性相结合,从而打造能引起深刻共鸣的人类体验。这种能力在产品设计和城市规划等领域日益凸显,这些领域的专业人士必须开发出既能满足实际需求又能满足情感需求的解决方案。

Emotional Innovation combines genuine creativity with social authenticity to craft deeply resonant human experiences. This skill is becoming increasingly evident in fields such as product design and urban planning, where professionals must develop solutions that address both practical and emotional human needs.

直觉式系统导航结合了批判性思维和社会真实性,用于管理涉及人类和人工智能要素的复杂系统。这项技能在医疗保健和供应链管理等行业尤为重要,因为在这些行业中,理解技术能力和人为因素之间的相互作用是成功的关键。

Intuitive Systems Navigation leverages critical thinking alongside social authenticity to manage complex systems involving both human and AI elements. This skill is particularly essential in industries like healthcare and supply chain management, where understanding the interplay between technological capabilities and human factors is critical for success.

最后,《复杂伦理决策》一书运用人文主义的三大维度,探讨人工智能驱动的世界中日益复杂的伦理挑战。这项技能在人工智能开发、医疗保健和金融服务等领域尤为重要,因为在这些领域,平衡技术能力、人类价值观和社会影响至关重要。

Finally, Complex Ethical Decision-Making draws on all three Humics dimensions to address the increasingly intricate ethical challenges of an AI-driven world. This skill is especially valuable in areas such as AI development, healthcare, and financial services, where balancing technical capabilities with human values and societal impact is paramount.

图像

图 13.1:智能体人工智能发展框架(来源:© Bornet 等人)

Figure 13.1: The Agentic AI Progression Framework (Source: © Bornet et al.)

支持腐植酸开发

Supporting Humics Development

我们坚信,组织需要积极支持这些基础能力的培养。我们发现以下方法尤其有效:

We firmly believe that organizations need to actively support the development of these foundational capabilities. We’ve found the following approaches to be particularly effective:

我们最成功的策略之一是创造鼓励实验和反思的学习环境,让员工通过现实世界的挑战和经验来培养他们的人文素养。

One of our most successful strategies has been creating learning environments that encourage experimentation and reflection, allowing employees to develop their Humics through real-world challenges and experiences.

我们已经看到,实施能够认可和奖励基于人文技能发展的衡量体系,超越传统的绩效指标,重视与人工智能互补的人类能力,取得了显著的成果。

We’ve seen remarkable results from implementing measurement systems that recognize and reward the development of Humics-based skills, moving beyond traditional performance metrics to value human capabilities that complement AI.

根据我们的经验,特别有效的方法是设计工作流程,优化人类人文能力和人工智能能力之间的协作,确保两者发挥各自独特的优势,从而取得更好的结果。

What’s proven particularly powerful in our experience is designing work processes that optimize the collaboration between human Humics and AI capabilities, ensuring each contributes their unique strengths to achieve better outcomes.

我们发现,通过专注于培养人文素养,企业可以打造一支随着人工智能能力不断发展而始终保持相关性和价值的员工队伍。我们最重要的发现之一是,当我们首先培养这些基本的人类能力时,特定的新技能就会自然而然地涌现出来。

We’ve discovered that by focusing on developing Humics, organizations can build workforces that remain relevant and valuable as AI capabilities continue to evolve. One of our most important learnings has been that specific and new skills will emerge naturally when we nurture those fundamental human capabilities first.

随着我们深入迈入智能体时代,我们相信,最成功的组织将是那些能够创造环境,使人类能力与人工智能并驾齐驱的组织。通过专注于发展“人类智能”(Humics),我们为持续适应和创新创造了条件,确保即使人工智能能力不断扩展,人类员工仍然不可替代。

As we move deeper into the agentic age, we believe the most successful organizations will be those that create environments where human capabilities can flourish alongside AI. By focusing on developing the Humics, we create the conditions for continuous adaptation and innovation, ensuring that human workers remain irreplaceable even as AI capabilities expand.

未来的工作并非人类与人工智能的对抗,而是人与人工智能携手共进,发挥各自优势,谱写和谐的乐章。我们坚信,专注于发展人类独有的能力,才能为智能体时代的持续适应和创新奠定基础。那些能够蓬勃发展的组织,必将理解这一根本真理:人类的能力若能得到充分发展,便能使人类以前所未有的方式运用人工智能,创造出单凭一己之力无法企及的可能性,从而创造前所未有的价值。

The future of work isn’t about humans versus AI—it’s about creating a symphony where both play to their strengths. We believe passionately that by focusing on developing our uniquely human capabilities, we create the foundation for continuous adaptation and innovation in the agentic age. The organizations that thrive will be those that understand this fundamental truth: our human capabilities, when properly developed, will allow humans to employ AI in ways that will create possibilities that one could not achieve alone and, as such, succeed in securing value at unprecedented levels.

这次不一样了:智能人工智能的黎明

This Time Is Different: The Dawn of Agentic AI

“我把自己的工作自动化了,”黛比在我们为一家财富500强公司提供咨询服务期间坦言。作为一名资深项目经理,她刚刚部署了一个三级人工智能代理,它不仅协调了整个软件发布流程,还能从错误中学习并调整方法——就像她自己一样。“二十年来,我管理过无数技术项目,见证了无数工具的兴衰更替。但这可不是工具。它能像我一样思考。”

“I’ve automated myself out of a job,” Debbie confessed during our consulting engagement at a Fortune 500 company. As a veteran project manager, she had just implemented a Level 3 AI agent that not only coordinated an entire software release but learned from its mistakes and adapted its approach—much like she would. “In twenty years of managing tech projects, I’ve seen countless tools come and go. But this isn’t a tool. This thing thinks like me.”

那一刻将永远铭刻在我们心中。黛比的反应深深触动了我们。她的经历清晰地诠释了人工智能智能革命与以往技术变革的根本区别。我们不再只是旁观者,而是见证了前所未有的变革,无论从速度还是性质上来说,这都是一次史无前例的巨变。

That moment will stay with us forever. Debbie’s reaction resonated deeply with us. Her experience crystallizes why the agentic AI revolution fundamentally differs from previous technological transformations. We’re not just observers—we’re witnesses something unprecedented in both velocity and nature of change.

加速悖论

The Acceleration Paradox

在过去几十年里,我们致力于在各行各业推广自动化技术,亲历了无数次技术变革浪潮。但如今我们所看到的智能体人工智能的发展,既让我们兴奋又让​​我们担忧——它与我们以往的任何经验都截然不同。

Throughout our decades of implementing automation technologies across industries, we’ve had front-row seats to many waves of technological change. But what we’re seeing now with agentic AI fills us with both excitement and concern—it’s unlike anything in our experience.

从历史的角度来看,技术革命遵循着一个可预测的模式:颠覆、适应和最终的平衡。工业革命对体力劳动的自动化历经数代人的时间才得以实现。数字革命对认知任务的计算机化也持续了数十年。每一次浪潮都给了社会时间来调整其教育体系、劳动力市场和社会结构。

Looking back through the lens of history, technological revolutions followed a predictable pattern: disruption, adaptation, and eventual equilibrium. The Industrial Revolution’s automation of physical labor unfolded over generations. The Digital Revolution’s computerization of cognitive tasks spanned decades. Each wave gave society time to adapt its educational systems, labor markets, and social structures.

但真正令我们着迷,坦白说有时也令我们感到担忧的是,智能体人工智能如何打破了这种模式。让我们分享一个我们最引人注目的经验:我们最近实施了一个3级智能体人工智能。我们委托一个代理为一家制造企业管理供应链。接下来发生的事情甚至让我们都感到惊讶——仅仅47天,它就能处理以前只有经验最丰富的专业人员才能完成的复杂物流决策。更令人惊叹的是它的进化速度——它从每一次交互中学习,不断改进决策,最终在需要理解上下文的领域超越了人类的表现。

But what fascinates and, frankly, sometimes alarms us is how agentic AI shatters this pattern. Let us share one of our most striking experiences: We recently implemented a Level 3 agent to manage supply chain for a manufacturing client. What happened next amazed even us—within 47 days, it was handling complex logistics decisions that previously required their most experienced professionals. Even more remarkable was how it evolved—teaching itself from each interaction, refining its decision-making, and eventually surpassing human performance in areas requiring context understanding.

这种快速转型使我们发现了所谓的“适应悖论”:随着与人工智能协同工作所需的技能日益精进,培养这些技能的时间却急剧缩短。我们在一家金融服务公司亲眼目睹了这种情况,该公司员工只有不到六个月的时间从处理交易过渡到协调复杂的人工智能工作流程——而这种转变在传统上需要数年时间逐步提升技能。

This rapid transformation led us to identify what we consider the “adaptation paradox”: just as the skills needed to work alongside AI become more sophisticated, the time available to develop these skills dramatically shrinks. We witnessed this firsthand at a financial services firm we advised, where employees had less than six months to transition from processing transactions to orchestrating complex AI workflows—a shift that traditionally would have taken years of gradual upskilling.

我们在一家咨询过的科技公司也目睹了类似的挑战。该公司人工智能开发团队的技能每三个月就会过时,因为他们的智能体也在不断进化。尤其令我们震惊的是,传统的季度培训模式完全失效——员工掌握新技能时,技术又已经取得了新的进展。这次经历让我们意识到一个突破性的教训:我们需要彻底重新思考职业发展模式,从周期性培训转向持续的、人工智能辅助的学习。

We also witnessed this challenge at a tech company we advised, where their AI development team’s skills were becoming obsolete every three months as their agents evolved. What particularly struck us was how traditional quarterly training cycles proved futile—by the time employees mastered new skills, the technology had advanced further. This experience led to what we consider a breakthrough realization: the need for a radical rethinking of professional development, shifting from periodic training to continuous, AI-assisted learning.

颠覆的本质

The Nature of Disruption

通过我们与人工智能实施的大量合作,我们逐渐认识到,当我们审视智能体人工智能发展框架时,这种颠覆性变革的独特特征便清晰可见。1级和2级智能体遵循着自动化发展的历史模式——取代常规的体力劳动和认知任务。然而,我们发现3级智能体真正具有革命性意义的地方在于它们本质上的差异。它们不仅执行任务,还能理解上下文、从经验中学习并做出决策。做出细致入微的决策,并提供指导建议。这一现实清楚地表明,人工智能代理正变得几乎具有自我反思能力,因为它们能够分析自身的决策过程并自主改进算法。

Through our extensive work with AI implementations, we’ve come to understand that the unique character of this disruption becomes clear when we examine the Agentic AI Progression Framework. Level 1 and 2 agents followed the historical pattern of automation—replacing routine physical and cognitive tasks. However, what we find truly revolutionary about Level 3 agents is how fundamentally different they are. They don’t just execute tasks; they understand context, learn from experience, make nuanced decisions, and provide steering recommendations. This reality makes clear that AI agents are becoming almost self-reflective as they can analyze their own decision-making processes and refine their algorithms autonomously.

我们最令人大开眼界的经历之一,来自我们合作的一家律师事务所。他们引入了一位代理来审核合同——这项工作通常需要多年的专业培训。最让我们惊讶的不仅仅是,短短几个月内,这位代理不仅处理文件的速度更快,而且还能识别出一些连资深律师都偶尔会忽略的细微法律风险。但真正让我们惊叹的是,我们看到它能从每一份合同中学习,不断提升对法律细微差别和商业背景的理解。

One of our most eye-opening experiences came from a legal firm we worked with that implemented an agent to review contracts—traditionally a task requiring years of specialized training. What surprised us most wasn’t just that within months, the agent wasn’t just processing documents faster; it was identifying subtle legal risks that even senior lawyers occasionally missed. But what truly amazed us was watching it learn from each contract, continuously improving its understanding of legal nuance and business context.

这一转变使我们意识到一个最重要的事实:传统的适应策略可能行不通。以往的技术革命使人类得以“向上攀升”,从事更复杂的认知工作。但如今,我们面临着前所未有的挑战:当人工智能能够学习、推理和适应时,人类还能攀登到怎样的更高境界?通过与众多机构的合作,我们发现答案不在于与人工智能竞争,而在于从根本上重新构想人机协作模式。

This transformation has led us to one of our most important realizations: traditional adaptation strategies may fail. Previous technological revolutions allowed humans to move “up the value chain” to more complex cognitive work. But we’re now facing an unprecedented challenge: when AI can learn, reason, and adapt, what’s the higher ground to which humans can move? Through our work with numerous organizations, we’ve discovered that the answer lies not in competing with AI but in fundamentally reimagining human-AI collaboration.

从历史中汲取经验,同时开辟新天地

Learning from History While Breaking New Ground

我们承认,这场变革史无前例,但我们的研究和经验表明,历史蕴含着至关重要的经验教训。我们发现,以往革命中成功的转型都具有一些共同要素:积极主动的适应、重视人类独有的能力以及强有力的制度支持。然而,当前形势的挑战在于,这些经验教训必须以前所未有的速度和规模加以应用。

While we acknowledge that this transformation is unprecedented, our research and experience have shown us that history offers crucial lessons. We’ve identified that successful transitions in previous revolutions shared common elements: proactive adaptation, focus on uniquely human capabilities, and strong institutional support. But what makes our current situation particularly challenging is that these lessons must be applied at an unprecedented pace and scale.

通过与具有前瞻性思维的组织的合作,我们发现,成功完成转型的企业并非将人工智能代理视为工具,而是将其视为需要全新管理方式的合作伙伴。尤其令人鼓舞的是,这些企业在充分利用人工智能的分析和学习能力的同时,也注重培养员工独特的人类能力——创造力、情商和复杂问题解决能力。

Through our work with forward-thinking organizations, we’ve observed that companies successfully navigating this transition treat AI agents not as tools but as collaborators requiring new management approaches. What’s particularly encouraging is how they focus on developing their workforce’s uniquely human capabilities—creativity, emotional intelligence, and complex problem-solving—while leveraging AI’s analytical and learning capabilities.

前进之路

The Path Forward

站在这个转折点,我们比以往任何时候都更加确信,智能体人工智能不仅代表着又一波自动化浪潮,更是人机协作方式的根本性转变。让我们夜不能寐的并非这场变革是否会发生——它已经在悄然展开。在我们看来,真正重要的是确保这场变革能够惠及组织、个人和社会。

As we stand at this inflection point, we’re more convinced than ever that agentic AI represents not just another wave of automation but a fundamental shift in human-machine collaboration. What keeps us up at night isn’t whether this change will happen—it’s already unfolding. What truly matters, in our view, is ensuring this transformation benefits organizations, individuals, and society.

在接下来的章节中,我们将分享我们认为在新形势下蓬勃发展的切实策略。我们承认,适应的窗口期可能比以往任何时候都更短,但我们依然保持乐观,因为我们亲眼目睹了那些理解并拥抱变革的组织和个人如何能够抓住非凡的机遇。我们的目标是帮助人们不仅能够度过这场变革,更能从中蓬勃发展。

In the chapters ahead, we’ll share what we believe are concrete strategies for thriving in this new reality. While we acknowledge that the window for adaptation may be shorter than ever, we remain optimistic because we’ve seen firsthand how organizations and individuals who understand and embrace this change can unlock extraordinary opportunities. Our goal is to help individuals not just survive this transformation but to thrive in it.

人工智能时代教育的革新

Reinventing Education in the Age of AI Agents

随着智能人工智能以前所未有的速度从根本上重塑我们的世界,我们必须认识到,传统的适应策略已不再适用。因为时代确实不同,我们需要重新思考人类技能和能力的根基。如果我们想要成功应对这场变革,我们必须回归本源,重新定义在人类与日益复杂的人工智能代理作为合作伙伴而非竞争对手的世界里,接受教育的意义。

As agentic AI fundamentally reshapes our world at unprecedented speed, we must recognize that traditional adaptation strategies will no longer suffice. Because this time is truly different, we need to reconsider the very foundation of human skills and capabilities. If we are to navigate this transformation successfully, we must return to first principles and redefine what it means to be educated in a world where humans and increasingly sophisticated AI agents collaborate as partners rather than competitors.

现代教育危机

The Crisis in Modern Education

人工智能的崛起正在从根本上改变我们的生活和工作方式,然而我们的教育体系自工业革命以来却几乎没有改变。以当今典型的课堂为例:学生们仍然死记硬背那些他们很容易就能从人工智能那里获取的信息,练习着很快就会被自动化取代的技能,遵循着无法培养未来至关重要的、人类独有能力的标准化课程。学生们非但没有被培养批判性思维,反而被阻止质疑他们所学的内容。这种工业化的教育方式在信息匮乏、日常认知任务由人类完成的时代或许行得通。但在一个人工智能可以即时获取和处理海量信息的世界里,我们的教育重点必须发生根本性的转变。

The rise of AI agents is fundamentally transforming how we live and work, yet our education systems have remained largely unchanged since the Industrial Revolution. Consider a typical classroom today: students still memorize facts they could easily retrieve from an AI agent, practice skills that will soon be automated, and follow standardized curricula that fail to nurture the uniquely human abilities that will matter most in the future. Instead of inculcating critical thinking, students are discouraged from questioning what they are taught. This industrialized education approach made sense in an era when information was scarce and routine cognitive tasks were performed by humans. But in a world where AI agents can instantly access and process vast amounts of information, our educational priorities must shift dramatically.

回归教育的真正目的

Returning to Education’s True Purpose

我们在研究这个课题时,最引人入胜的发现之一是,“学校”(school)一词源于希腊语“skholē”,意为“休闲”或“休息”。我们认为这个词源意义深远,因为它揭示了教育最初目的的一个深刻真理:教育并非为了让劳动者从事工作,而是为了提供空间,让人们进行深思熟虑,探索人生的根本问题。古希腊人将教育视为帮助人们找到人生目标、充分发挥自身潜能的途径,这一点令我们深受启发。当然,这同样也是大学博雅教育的初衷,但随着时间的推移,这种理念已逐渐式微。

One of our most fascinating discoveries in researching this topic was that the word “school” comes from the Greek word “skholē,” meaning “leisure” or “rest.” We find this etymology deeply revealing because it uncovers a profound truth about education’s original purpose: it wasn’t about preparing workers for jobs, but about providing space for thoughtful reflection and exploration of life’s fundamental questions. What inspires us about the ancient Greeks’ approach is how they saw education as a means to help people find their purpose and develop their full potential as human beings. That was, of course, the original idea behind liberal arts education in colleges as well, but the idea has lost popularity over time.

我们认为,这种历史视角有力地批判了当今以职业为导向的教育方式,在这种方式下,成功往往以考试成绩和就业率来衡量。相反,我们认为教育应该培养个人发展、成长和反思的动力,从而寻找加速自身发展的机会。

In our view, this historical perspective offers a powerful critique of today’s career-oriented approach, where success is often measured by test scores and job placement rates. Instead, we believe that education needs to equip individuals with the drive to develop, grow, and promote reflective thinking that looks for opportunities to accelerate these developmental processes.

理解人类发展的根本目的,才能更好地造福人类,因为这将赋予我们的行动意义,并让我们意识到身边存在的机遇(例如当今的人工智能),从而从中受益。随着人工智能代理能够越来越多地处理日常认知任务并帮助创造机遇,我们认为现在是时候回归这种更全面的教育观了——这种教育观强调的是人的发展而非职业培训。从这个角度来看,哲学、心理学和行为学等技术方法应该在我们整体的教育体系中占据更加核心的地位。

Humanity is best served by understanding what purpose underlies human development, as it will provide meaning to our actions and create awareness about the opportunities (such as AI today) around us to benefit from. As AI agents can increasingly handle routine cognitive tasks and help facilitate opportunity creation, we believe it’s time to return to this more holistic view of education—one that emphasizes human development over job training. From this perspective, philosophy, psychology, and behavioral approaches to technology should become more central to our overall education approach.

通过三大能力重塑教育

Reimagining Education Through the Three Competencies

本书前文已探讨了未来三大核心能力,现在让我们来看看它们如何重塑我们的教育方式。这些核心能力不仅仅是需要教授的技能,它们更应该是我们构建所有学习体验的组织原则。

Having explored the Three Competencies of the Future earlier in this book, let’s examine how they can reshape our approach to education.196 These competencies aren’t just skills to be taught—they should be the organizing principles around which we structure all learning experiences.

在实践中,这意味着要从这些能力的角度来改造传统学科。例如,数学应该从死记硬背公式(人工智能可以胜任)转向培养解决问题的策略和创造性地应对数学挑战的方法,从而让数学通过探索其影响而变得生动有趣。我们最喜欢的例子之一是文学课如何减少对情节的记忆,而更多地关注……对人工智能代理无法真正理解的人类经验进行深度分析、情感理解和探索。

In practice, this means transforming traditional subjects through the lens of these competencies. Mathematics, for instance, should shift from memorization of formulas (which AI can handle) to developing problem-solving strategies and creative approaches to mathematical challenges so that math comes alive by exploring its impact. One of our favorite examples is how literature classes can focus less on plot recall and more on deep analysis, emotional understanding, and the exploration of human experiences that AI agents cannot truly comprehend.

全球教育创新

Global Innovation in Education

在我们的研究和实践中,我们见证了一些国家在教育改革方面开创先河,并遵循这些原则,这些案例令人深受启发。芬兰的课程改革尤其给我们留下了深刻的印象,它强调跨学科学习和解决实际问题。芬兰的“现象学学习”方法打破了传统的学科界限,这一点令我们着迷。我们亲眼目睹了学生们在学习气候变化的过程中,如何同时接触科学数据(培养人工智能分析能力)、探索创新解决方案(增强人本能力),并随着新信息的出现不断调整理解(培养变革适应能力)

Throughout our research and experience, we’ve witnessed some truly inspiring examples of countries pioneering educational reforms that align with these principles. We’ve been particularly impressed by Finland’s transformation of its curriculum to emphasize interdisciplinary learning and real-world problem-solving. What fascinates us about their approach, called “phenomenon-based learning,” is how it breaks down traditional subject boundaries. We’ve seen firsthand how, when studying climate change, students simultaneously engage with scientific data (developing AI-Ready analytical skills), explore creative solutions (strengthening Human-Ready capabilities), and adapt their understanding as new information emerges (building Change-Ready competencies).197

我们发现丹麦的教育体系及其“课堂时间”(Klassens Tid)项目也极具吸引力。自1993年以来,6至16岁的学生每周都会花一个小时培养情商和社交能力。我们喜欢这些课程的原因在于,学生们学习如何应对复杂的社交情境、解决冲突并建立真诚的人际关系——我们相信,随着人工智能代理处理越来越多的日常互动,这些技能将变得越来越重要

Another model that we find particularly compelling is Denmark’s education system with their “Klassens Tid” (Class Time) program. Since 1993, students aged 6-16 spend one hour each week developing emotional intelligence and social authenticity. What we love about these sessions is how students learn to navigate complex social situations, resolve conflicts, and build genuine human connections—skills that we believe become increasingly valuable as AI agents handle more of our routine interactions.198

我们研究过的最具前瞻性的方法之一是新加坡的“技能创前程”计划,该计划展示了教育系统如何发展以支持终身学习。我们尤其对他们的创新方法感到兴奋,包括定期进行技能预测以预见未来需求,灵活的学习路径使学生能够将传统学术知识与实践技能相结合,将人工智能工具融入学习过程并同时注重人的发展,以及高度重视以项目为基础的学习,从而培养学生的适应能力和创造力。199

One of the most forward-thinking approaches we’ve studied is Singapore’s “SkillsFuture” initiative, which demonstrates how education systems can evolve to support lifelong learning. We’re particularly excited about their innovative approaches, including regular skills forecasting to anticipate future needs, flexible learning pathways that allow students to combine traditional academics with practical skills, integration of AI tools into the learning process while maintaining focus on human development, and a strong emphasis on project-based learning that develops adaptability and creativity.199

这些新的教育方法并非仅适用于中小学生。东北大学校长约瑟夫·奥恩 (Joseph Aoun) 于 2017 年出版了《机器人证明》(Robot Proof ) 一书,书中阐述了人工智能时代大学教育的新思路。他认为,鉴于人工智能的能力,仅仅向大学生灌输知识已经不再合适。相反,他相信大学应该培养学生的创造性思维和创造对社会有价值的事物的能力,无论是艺术作品还是疾病的新疗法。

These new approaches to education don’t apply only to elementary and secondary school students. Joseph Aoun, the president of Northeastern University, published a book in 2017 entitled Robot Proof that defined a new approach to university education in the age of AI.200 He argued that stuffing college students’ minds with facts was no longer appropriate, given the capabilities of AI. Instead, he believes colleges should cultivate a creative mindset and the ability to create something valuable to society, whether it’s an artistic work or a new treatment for a disease.

明日教室

The Classroom of Tomorrow

我们认为,未来的课堂应该被构想为学生学习如何运用人工智能,同时增强自身独特人类能力的场所。学生不应在人工智能更擅长的任务上与它竞争,而应专注于发展能够促进人机协作的互补能力。这些能力中有些需要理解计算机和人工智能的思维方式,但大多数更侧重于人类自身。

We believe we should envision future classrooms as spaces where students learn to leverage AI while strengthening their uniquely human capabilities. Instead of competing with AI at tasks it does better, students should focus on developing complementary abilities that enhance human-AI collaboration. Some of those abilities will involve understanding how computers and AI think, but the majority are more human-focused.

我们已经看到这种转变在各个学科领域以各种方式成功展现出来。我们最喜欢的例子之一是写作课,学生们使用人工智能代理进行基本的草稿撰写和编辑,但将精力集中在发展独特的写作风格上。声音和引人入胜的叙事。这对许多学生来说是一次重大转变,他们过去被鼓励只需完成写作作业即可。现在,他们需要成为富有创造力的写作者和批判性编辑。我们也看到一些科学项目,人工智能负责数据分析,而学生们则专注于提出创造性的假设并以新颖的方式解读结果。此外,我们还观察到一些历史课利用人工智能提供事实信息,使学生能够更深入地探讨历史模式、人类动机和伦理影响。

We’ve seen this transformation successfully manifest in various ways across subjects. One of our favorite examples is in writing classes, where students use AI agents for basic drafting and editing but focus their energy on developing unique voices and compelling narratives. This is a major pivot for many students, who have been encouraged to just complete a writing assignment and move on. Now, they need to become creative writers and critical editors. We’ve also seen science projects in which AI handles data analysis while students focus on forming creative hypotheses and interpreting results in novel ways. And we’ve observed history lessons that leverage AI to provide factual information, freeing students to engage in deeper discussions about historical patterns, human motivation, and ethical implications.

教师角色的演变

The Teacher’s Evolving Role

在这种新的教育模式下,教师的角色不再仅仅是信息的提供者,而是学习的架构师。他们的角色转向设计能够培养学生独特人类能力的学习体验,引导学生与人工智能代理进行有效协作,促进有意义的讨论和更深入的理解,并帮助学生认识和强化自身独特的人类贡献。例如,教师成为教练,引导学生从内容(人工智能生成内容并提供建议)过渡到知识(人类运用内容来理解自身的业务现实)。

In this new educational paradigm, teachers become less information providers and more learning architects. Their role shifts toward designing experiences that develop students’ uniquely human capabilities, guiding students in effective collaboration with AI agents, facilitating meaningful discussions and deeper understanding, and helping students recognize and strengthen their distinctly human contributions. For example, teachers become coaches, facilitating a transition from content (AI generates content and provides recommendations) to knowledge (humans applying content to make sense of their own business reality).

前进之路

The Path Forward

或许无需赘言,这种转型需要我们在教育的结构和实施方式上做出重大改变。我们坚信,必须重新设计评估方法,以评估人的能力而非记忆的知识;必须加大对教师培训的投入,重点关注人工智能的融合和人际技能的培养;必须创建灵活的学习环境,既支持个人探索,也支持协作学习;必须开发新的课程,强调人的发展与技术能力的提升。从大学校长到小学校长,所有教育领导者都将参与其中。他们需要成为精通人工智能的变革管理者,就像企业高管需要承担这样的角色一样。

Perhaps needless to say, this transformation requires significant changes in how we structure and deliver education. We strongly believe we must redesign assessment methods to evaluate human capabilities rather than memorized knowledge, invest in teacher training focused on AI integration and human skill development, create flexible learning environments that support both individual exploration and collaborative learning, and develop new curricula that emphasize human development alongside technical competency. Educational leaders—from university presidents to elementary school principals—will need to become AI-savvy change managers, just as corporate executives need to adopt such roles.

真正让我们夜不能寐的是,我们意识到事关重大。我们社会和经济的成功运转都依赖于这场教育变革。随着人工智能代理变得越来越复杂,我们亲眼目睹了能够适应的人和不能适应的人之间的差距正在不断扩大。但我们也充满希望,因为通过围绕这些原则重塑教育,我们可以确保子孙后代不仅能够在人工智能代理的世界中生存,而且能够在其中蓬勃发展,以他们独特的方式保持不可替代的地位。

What keeps us up at night is the realization that the stakes couldn’t be higher. The successful functioning of our societies and our economies depends upon this educational transformation. As AI agents become more sophisticated, we’re seeing firsthand how the gap between those who can adapt and those who cannot continues to widen. But we’re also filled with hope because by reinventing education around these principles, we can ensure that future generations are not just prepared to survive in a world of AI agents, but to thrive in it, remaining irreplaceable in their own unique ways.

在这个新时代,衡量教育成功的标准不再是学生能记住多少信息或能多好地完成标准化任务,而是学生如何有效地运用自身独特的人类能力与人工智能体协作,解决复杂问题并创造新的价值。这才是人工智能时代教育真正的挑战和机遇。

The measure of educational success in this new era won’t be how much information students can retain or how well they can perform standardized tasks. Instead, success will be measured by how effectively students can apply their uniquely human capabilities in collaboration with AI agents to solve complex problems and create new value. This is the true challenge—and opportunity—of education in the age of AI agents.

第十四章

CHAPTER 14

代理人时代的社会

SOCIETY IN THE AGE OF AGENTS

在智能体驱动的世界中重新构想人类潜能

Reimagining Human Potential in an Agent-Powered World

T人们面对人工智能的崛起,首先提出的问题往往是关于工作——它们会抢走我的饭碗吗?但或许我们问错了问题。真正让我们兴奋的是另一种可能性:如果自主智能体的出现非但没有威胁我们的生计,反而为我们带来革命性的变革,让我们有机会从根本上重新构想现代社会中“人”的意义,那又会怎样呢?

The first question people ask when confronting the rise of AI agents is often about jobs—will they take mine? But perhaps we’re asking the wrong question. What excites us most is a different possibility: what if, instead of threatening our livelihoods, the emergence of autonomous agents offers us something revolutionary: the opportunity to fundamentally reimagine what it means to be human in the modern world?

我们所熟知的“工作”时代即将终结?

The End of Work as We Know It?

让我们先从一个令人不适却又不得不面对的事实说起:如今许多工作都让人感到极度空虚。我们在各个组织中都亲眼目睹了这一点,数据也证实了我们的观察。根据盖洛普公司的一项广泛研究,全球高达77%的员工都面临着同样的问题。他们表示对工作缺乏热情。201最令我们痛心的是,他们形容自己的日常工作重复乏味,最终毫无成就感。我们遇到的一个特别引人注目的例子是,仅在美国,就有超过150万人每天从事重复性的簿记和会计工作,这些工作令人精疲力竭,却无法带来任何满足感。202

Let’s begin with what we consider an uncomfortable but necessary truth: much of today’s work is deeply unfulfilling. We’ve seen this firsthand in organization after organization, and the data confirms our observations. According to extensive Gallup research, a staggering 77% of employees worldwide report feeling disengaged from their work.201 What breaks our hearts is hearing how they describe their daily tasks as repetitive, tedious, and ultimately unrewarding. One particularly striking example we’ve encountered is that in the United States alone, over 1.5 million people spend their days performing repetitive bookkeeping and accounting tasks that drain rather than fulfill.202

但真正令我们震惊的是,工作不仅令人感到空虚,它还在实实在在地夺走我们的生命。我们发现的统计数据令人触目惊心:国际劳工组织报告称,与工作相关的压力和由此引发的疾病每年导致近300万人死亡,<sup> 203</sup>社会成本约为3万亿美元。<sup>204</sup>为了更好地理解这场危机的严重性,不妨想想,死于工作相关原因的人数是死于交通事故的两倍,<sup> 205</sup>是死于战争的十倍。<sup>206</sup>我们已经确信,目前的制度根本无法持续。

But what truly alarms us is that work isn’t just unfulfilling—it’s literally killing us. The statistics we’ve uncovered are shocking: The International Labour Organization reports that work-related stress and resulting illnesses lead to nearly 3 million deaths annually,203 with a societal cost of approximately $3 trillion.204 To help grasp the magnitude of this crisis, consider that twice as many people die from work-related causes as from road accidents,205 and ten times more than from war.206 We’ve become convinced that the current system simply isn’t sustainable.

通过在各行业实施人工智能解决方案,我们见证了人工智能代理如何通过不断提升自身能力而发展壮大。智能体的能力层级不断提升——从基本的基于规则的自动化到更复杂的自主操作。随着技术成熟和能力的提升,它们为我们重塑未来提供了契机。虽然我们的经验表明,目前的1-3级智能体主要作用是增强人类的能力,但我们对4级乃至最终的5级智能体的发展前景感到振奋,这预示着未来机器或许能够承担相当一部分传统经济活动。

Through our work implementing AI solutions across industries, we’ve witnessed how AI agents progress through increasing levels of capability—from basic rule-based automation to more sophisticated autonomous operations. As they mature in technological sophistication and capability, they offer us a chance to reshape this reality. While our experience shows that current Level 1-3 agents are primarily augmenting human capabilities, we’re excited by how the progression toward Level 4 and, eventually, Level 5 agents points to a future where machines could handle a significant portion of traditional economic activity.

从历史角度看解放

A Historical Perspective on Liberation

我们最喜欢的历史洞见之一来自1930年,当时经济学家约翰·梅纳德·凯恩斯写了一篇题为《我们子孙后代的经济前景》的文章。207的远见卓识最令我们着迷的是,他设想了一个工作时间大幅缩短的未来。他预测,到21世纪,技术进步将使每周工作15小时成为可能——按今天的标准来看,这还不到两天。尤其令我们印象深刻的是,他的愿景并非着眼于失业,而是将人们从繁重而枯燥的工作中解放出来。

One of our favorite historical insights comes from 1930, when economist John Maynard Keynes wrote an essay titled “Economic Possibilities for Our Grandchildren.”207 What fascinates us about his vision is how he envisioned a future with dramatically reduced working hours. He predicted that by the 21st century, technological advancement would enable 15-hour workweeks—less than two full days by today’s standards. What’s particularly striking to us is that his vision wasn’t about unemployment but about liberation from back-breaking and mind-numbing work.

在与行业领袖的交流中,我们也听到了类似的预测。像马云这样的科技领袖曾表示,人工智能可以将每周工作时间缩短至三天,每天工作四小时。比尔·盖茨最近也做出了类似的预测。虽然这听起来像是乌托邦,但我们尤其受到早期缩短工作时间实验的启发。并取得了令人鼓舞的成果。我们最喜欢的例子之一是新西兰的Perpetual Guardian公司,这是一家遗产规划公司,该公司在其240名员工中试行了每周32小时的工作制,且没有降低员工薪资。结果甚至让我们都感到惊讶:员工压力水平降低,敬业度提高,生产力提高了30-40% 。210

In our conversations with industry leaders, we’ve heard similar predictions. Tech leaders like Jack Ma have suggested that AI could reduce the workweek to just three days, with four-hour workdays.208 Bill Gates recently made a similar prediction.209 While this might sound utopian, we’ve been particularly inspired by early experiments with reduced work schedules yielding promising results. One of our favorite examples is New Zealand’s Perpetual Guardian, an estate planning company that trialed a 32-hour workweek for its 240 employees without reducing salaries. The results amazed even us: stress levels decreased, engagement improved, and productivity surged by 30-40%.210

在英国,61家公司参与了一项每周工作四天的实验,员工薪酬与之前持平。结果一致好评。员工睡眠质量提高,压力减轻,心理健康状况改善,个人生活也更加充实。公司收入与之前相比保持不变或有所增长。试点结束后,61家公司中有56家决定继续推行这种工作安排

In the UK, 61 companies participated in a four-day workweek experiment with 100% of previous compensation. The results were uniformly positive. Employees had better sleep, less stress, better mental health, and more fulfilling personal lives. Company revenues stayed the same or grew compared to previous periods. 56 of the 61 companies decided to continue with the work arrangement after the pilot ended.211

这些新的工作实验是在人工智能代理乃至生成式人工智能广泛应用之前进行的。试想一下,如果这些新工具能够实现多少人类劳动的自动化,员工的工作时间将大幅减少,工作效率却更高,收入却不变,他们的生活将会多么充实。

These new work experiments were conducted before the widespread advent of AI agents and even generative AI. Imagine how much human labor could be productively automated with these new tools, and how much more fulfilling employees’ lives could be if they could work substantially fewer hours, still be more productive, and make the same amount of money.

超越薪酬:重新定义价值

Beyond the Paycheck: Reimagining Value

人工智能代理带来的最深刻挑战和机遇之一,在于它为我们重新定义衡量人类价值提供了契机。当前体系中一个令人深感不安的方面是,在许多社会中,社会价值与报酬之间存在着反比关系。想想我们都经历过的事情:照护。这是任何社会中最基本的功能之一。无论是……无论是养育子女还是照顾老人,这项至关重要的工作往往报酬很低,甚至根本没有报酬。

One of the most profound challenges—and opportunities—presented by AI agents is the chance to redefine how we measure human value. One deeply troubling aspect of the current system is the inverse relationship between social value and compensation in many societies. Consider something we’ve all experienced: caregiving. This is one of the most essential functions in any society. Whether it’s raising children or caring for the elderly, this vital work is often poorly compensated or not paid at all.

如果借助人工智能和机器人的辅助,照护工作能够更高效地完成,那么负责照护的人类或许最终能够获得应有的尊重和报酬。接受照护的人也可能从中获得更多满足感。迄今为止,即使在像日本这样大力投资于机器人技术的国家,机器人在老年护理领域的应用也并不十分成功。然而,人工智能和更智能的机器人系统似乎有可能显著提升其效率。与其他职业一样,人类照护者仍然至关重要,他们的价值也需要得到更高的认可。

If caregiving activities could be performed more productively with the assistance of AI agents and robots, perhaps the humans who oversee them would finally receive the respect and compensation they deserve. Those receiving care might also find it more fulfilling. So far, robots in eldercare have not been particularly successful, even in societies like Japan that have invested heavily in them.212 However, it seems possible that AI agents and smarter robotic systems could significantly enhance their effectiveness. As with other jobs, human caregivers will still be essential and need to be valued more highly than they are today.

我们尤其受到后工作主义运动的影响,该运动由大卫·格雷伯<sup> 213</sup>和海伦·赫斯特<sup> 214</sup>等思想家引领。他们提出的有力论点与我们工作中的观察不谋而合:机器对传统工作的自动化将迫使我们彻底反思社会结构和资源分配方式。最令我们兴奋的是,这不仅仅关乎缩短工时,更关乎从根本上重新审视我们的价值观及其背后的原因。

We’ve been particularly influenced by the post-workist movement, led by thinkers like David Graeber213 and Helen Hester.214 Their compelling argument, which aligns with what we’ve observed in our work, is that the automation of traditional work by machines will force us to radically rethink how we structure society and distribute resources. What excites us most is that this isn’t just about shorter working hours—it’s about fundamentally reconsidering what we value and why.

在人工智能驱动的世界中拥抱目标

Embracing Purpose in an AI-Driven World

我们在工作中遇到的最发人深省的问题之一是:如果我们减少工作时间甚至完全不工作,我们会如何安排时间?基于我们的研究和与劳动者的交流,我们得出以下结论。我们相信,在各个行业中,我们都可以专注于一些有意义的事情,例如关爱他人和保护地球。

One of the most thought-provoking questions we encounter in our work is: If we worked less or not at all, how would we spend our time? Based on our research and conversations with workers across industries, we believe we might focus on purposeful pursuits like caring for others and protecting the planet.

组织机构的调查确实表明,如果人工智能能够节省员工的时间,他们希望组织允许他们将节省下来的时间用于“有意义”的事情。<sup> 215</sup>如果人工智能代理的采用能够为人类创造时间和自由,那么我们可能面临的最令人兴奋的机遇之一,就是人类能够更加专注于个人成长、自我发展和更积极的社区参与,从而创造一个我们能够真正追求集体福祉的世界。<sup> 216</sup>

Surveys in organizations indeed show that if AI saves employees time, they would like the organization to allow them to use that saved time for “meaningful” things.215 If the adoption of AI agents can create time and liberation for humans, then one of the most exciting opportunities that may await us is a reality where humans can focus more on personal growth, self-development, and stronger community engagement that can create a world where we can truly pursue the collective welfare.216

我们在与一些具有前瞻性思维的机构合作的过程中,已经看到了这种未来的雏形:人们拥有更多空闲时间,可以探索兴趣爱好、开展创业项目或学习新技能和语言。尤其令人鼓舞的是,我们看到人们旅行、参与志愿活动,并通过有意义的人际关系加强彼此间的联系。我们观察到,当人们更容易重视身心健康时,幸福感和满足感自然而然地就会提升。

We’ve seen glimpses of this future in our work with forward-thinking organizations, where people with more free time explore hobbies, take up entrepreneurial projects, or learn new skills and languages. What’s particularly inspiring is watching people travel, volunteer, and strengthen social bonds through meaningful connections. We’ve observed that when prioritizing mental and physical health becomes easier, it naturally fosters well-being and fulfillment.

我们从历史上的诸多相似之处中汲取灵感,正如工业革命拓展了人们的休闲时间并激发了艺术创新一样,我们相信人工智能也能催生出全新的娱乐、学习和社区空间。在我们看来,办公室或许会转变为艺术、娱乐和协作的中心,进而推动创造力和创新。

Drawing from historical parallels that fascinate us, just as the Industrial Revolution expanded leisure time and spurred artistic innovation, we believe AI could lead to new forms of entertainment, learning, and community spaces. In our vision, offices might transform into hubs for art, play, and collaboration, which in turn will drive creativity and innovation.

尽管这是一个充满吸引力的未来,但人们始终担忧工时减少会导致经济崩溃。然而,我们的研究和经验表明,人工智能代理的合理运用很可能会推动经济增长,使人们受益。更多时间享受生活,参与消费市场。我们尤其认同亨利·福特在1924年的观点:休闲能够激发产品需求,从而促进经济繁荣。我们看到的机遇无限,未来充满无限可能。

Even though this is an appealing future, a recurring concern is the fear of an economic collapse when work hours are reduced. Our research and experience, however, suggest that the right use of AI agents would likely drive economic growth, giving people more time to enjoy life and engage in the consumer market. We’re particularly struck by Henry Ford’s observation in 1924 that leisure fuels demand for products, enabling economies to thrive. The possibilities we see are vast, and the future is full of potential.

全民基本收入:人类繁荣的基础?

Universal Basic Income: A Foundation for Human Flourishing?

随着人工智能代理承担更多传统的经济角色,我们需要新的机制来确保每个人都能满足基本需求。我们尤其关注全民基本收入(UBI)这一可能的解决方案——无论就业状况如何,都为每位成年公民提供足以支付基本生活必需品的保障收入。217如果像智能体人工智能这样的技术导致大规模失业,此类项目可能成为绝对必需品,尽管我们预计这种情况不会发生。

As AI agents take on more traditional economic roles, we need new mechanisms to ensure everyone can meet their basic needs. We’re particularly intrigued by universal basic income (UBI) as one possible solution—providing every adult citizen with a guaranteed income sufficient to cover basic necessities, regardless of employment status.217 Such programs may become an absolute necessity if technologies like agentic AI lead to large-scale employment loss, although that is not an outcome we expect.

我们认为全民基本收入(UBI)最引人注目之处在于,它不像与就业状况或特定条件挂钩的福利项目那样,而是普惠性的,无需收入审查,这意味着无论收入、就业状况或个人情况如何,每个人都能获得UBI。UBI已在多种环境下进行过测试,但迄今为止规模较小。一项规模最大、最新的UBI实验的结果与之前的研究一致,表明UBI通常会减少受益者的工作量,并降低他们对满足住房等基本需求的焦虑。然而,它通常并不能带来我们所期望的一些行为。希望看到的是,例如自我提升、创业活动或与家人共度时光。219

What we find most compelling about UBI is that unlike welfare programs tied to employment status or specific conditions, it’s universal and non-means-tested, meaning it is provided to everyone regardless of their income, job status, or personal circumstances. UBI has been tested in a variety of settings, although they have been on a small scale thus far. Results from one of the largest and most recent UBI experiments, which are consistent with previous studies, suggest that it often reduces the amount of work that recipients do and lowers their anxiety about meeting basic needs such as housing.218 It does not, however, often lead to some of the desired behaviors we would like to see, such as self-improvement, entrepreneurial activities, or spending time with family.219

让我们明确一点:如果全民基本收入(UBI)导致社会充斥着闲散的消费者,我们无意倡导它。我们希望,最终,UBI能够为人们提供基础,让他们能够为社会做出更有意义的贡献。当人们摆脱了生存的压力,或许可以将时间投入到社区服务、创意活动、环境保护或关爱他人等工作中——所有这些活动都能创造巨大的社会价值,但在我们目前的体系下却往往难以获得收入。然而,要实现这些目标,就需要运用我们认为在职场中同样必要的社会工程手段。

Let us be clear: we have no interest in advocating for UBI if it leads to a society of idle consumers. We hope that, eventually, UBI could provide the foundation for people to pursue more meaningful contributions to society. When freed from the immediate pressure of survival, perhaps people could invest time in community service, creative pursuits, environmental stewardship, or caring for others—all activities that create tremendous social value but often generate little or no income in our current system. Accomplishing these objectives, however, would require the same types of social engineering that we believe are needed in the workplace.

更人性化的未来

A More Human Future

经过多年人工智能解决方案的实施和影响,我们深刻地认识到:自主代理的出现给我们带来了一个至关重要的选择。我们可以继续尝试在机器日益占据主导地位的任务上与它们竞争,也可以利用这场技术革命,让自己变得更加人性化,并与机器的工作相辅相成。

After years of implementing AI solutions and witnessing their impact, we’ve come to a profound realization: The advent of autonomous agents presents us with a crucial choice. We can continue trying to compete with machines at tasks they’ll increasingly dominate, or we can use this technological revolution as an opportunity to become more fully human and complement the work of those machines.

需要牢记的一个重要观点是,随着人工智能代理能力的不断提升,社会重组的压力只会越来越大。事实上,我们确信这种​​变革必将发生,因此,我们需要明确如何塑造这一变革。而如何做到这一点,恰恰凸显了我们工作的核心问题:“我们将利用这项技术促进人类福祉,还是任由它加剧现有的不平等和社会矛盾?”

An important idea to keep in mind is that as AI agents progress through higher levels of capability, the pressure to reorganize society will only increase. In fact, we’re convinced that this transformation will happen, and as such, we’ll need to be clear on how we’ll shape it. How to do so underscores the question that is really driving our work: “Will we use this technology to enhance human flourishing, or will we allow it to exacerbate existing inequalities and social tensions?”

基于我们指导企业进行数字化转型的经验,我们坚信,现在就是为未来做好准备的最佳时机。虽然目前生产环境中的人工智能代理可能仅限于发展框架的1-3级,但发展轨迹清晰可见,更高级别的人工智能代理必将很快到来。但这不应让我们感到恐惧。相反,我们乐观地认为,通过有意识地选择如何整合这项技术并重组我们的社会结构,我们可以帮助创造一个能够增强而非削弱人类独特特质的未来。

Based on our experience guiding organizations through digital transformation, we believe passionately that the time to start preparing for this future is now. While current production AI agents may currently be limited to Levels 1-3 of the Progression Framework, the trajectory is clear, and higher levels of AI agents will arrive sooner rather than later. But this should not make us fearful. Instead, we’re optimistic that by making conscious choices about how we integrate this technology and reorganize our social structures, we can help create a future that enhances rather than diminishes what makes us uniquely human.

或许,我们从工作中获得的最深刻的洞见来自计算机科学家和人工智能专家李开复。他提醒我们,人工智能带来的自由应该让我们“专注于真正使我们成为人的东西:爱与被爱”。这概括了我们认为人工智能代理的真正价值所在——不仅仅是自动化我们的工作,更是帮助我们重新发现并滋养我们的人性。

Perhaps the most powerful insight we’ve gained through our work comes from Kai-Fu Lee, a leading computer scientist, and AI expert, who reminds us that the freedom gained through AI should allow us to “focus on what truly makes us human: loving and being loved.” This encapsulates what we believe is the real promise of AI agents—not just to automate our work, but to help us rediscover and nurture our humanity.

构建未来智能人工智能治理框架

A Framework for Governing the Future of Agentic AI

在我们多年从事人工智能研究的过程中,从未遇到过像智能体人工智能的崛起这样既引人入胜又充满挑战的事物。在这个技术发展史上的关键时刻,我们相信社会正面临着前所未有的机遇和风险,这些机遇和风险令我们夜不能寐。通过与各行各业的公司合作,我们亲眼目睹了这些强大的系统如何在效率和创新方面带来巨大的益处。然而,我们也目睹了潜在的风险,这些风险可能会从根本上改变我们的世界,而这或许并非我们所愿。关键问题不在于是否要发展这些技术——它们已经存在——而在于如何确保它们始终处于人类的有效控制之下,同时最大限度地发挥其对社会的益处。

In our years of working with artificial intelligence, we’ve never encountered anything quite as fascinating—or as challenging—as the rise of agentic AI. As we stand at this pivotal moment in technological history, we believe society faces unprecedented opportunities and risks that keep us awake at night. Through our work with companies across industries, we’ve seen firsthand how these powerful systems promise extraordinary benefits in efficiency and innovation. Yet, we’ve also witnessed potential risks that could fundamentally reshape our world in ways we may not desire. The key question isn’t whether to develop these technologies—they’re already here—but how to ensure they remain under meaningful human control while maximizing their benefits to society.

智能体人工智能的独特挑战

The Unique Challenge of Agentic AI

通过我们广泛的人工智能系统实施工作,我们发现了一种令智能体人工智能脱颖而出的卓越特性:它拥有非凡的自主决策和适应能力。这并非人工智能领域的又一次渐进式进步,而是一次根本性的转变,我们相信它将重塑我们与技术的关系。与当今在明确定义的参数范围内运行且需要人类明确指导的人工智能系统不同,智能体人工智能可以设定自己的目标,从环境中学习,并独立做出决策。我们发现,这种自主性带来了全新的风险类别,而我们现有的监管框架和公司治理结构根本无法应对这些风险。

From our extensive work implementing AI systems, we’ve discovered something remarkable that sets agentic AI apart: its extraordinary capacity for autonomous decision-making and adaptation. This isn’t just another incremental advance in AI—it’s a fundamental shift that we believe will reshape our relationship with technology. Unlike today’s AI systems, which operate within clearly defined parameters and require explicit human guidance, agentic AI can set its own objectives, learn from its environment, and make decisions independently. We’ve found this autonomy introduces entirely new categories of risk that our current regulatory frameworks and corporate governance structures simply aren’t equipped to handle.

我们的研究和实践经验表明,随着人工智能自主性层级的提升——从基本的基于规则的自动化到完全自主系统——控制的复杂性呈指数级增长。虽然1级和2级系统可以通过传统的监管机制进行管理,但我们意识到,3级及更高级别的系统需要全新的治理和控制方法。

In our research and practical experience, we’ve observed how the complexity of control increases exponentially as we progress through the levels of AI agency—from basic rule-based automation to fully autonomous systems. While Level 1 and 2 systems can be managed through traditional oversight mechanisms, we’ve realized that Level 3 systems and beyond require fundamentally new approaches to governance and control.

控制的三层框架

A Three-Tiered Framework for Control

我们认为,要有效控制人工智能的自主性,需要在三个层面采取协调一致的行动:政府监管、公司治理和个人监督。每个层面都拥有不同的职责和可用的工具。

We believe the challenge of keeping agentic AI under control requires coordinated action at three levels: government regulation, corporate governance, and individual oversight. Each level has distinct responsibilities and tools at its disposal.

在政府层面,当务之急是建立清晰的监管框架,在促进创新的同时,为人工智能的自主性设定界限。这包括对高风险应用中的故障保护机制和人工监督提出强制性要求。政府还必须建立明确的问责机制,明确自主系统造成损害时的责任归属。

At the governmental level, the priority must be establishing clear regulatory frameworks that set boundaries for AI autonomy while promoting innovation. This includes mandatory requirements for fail-safe mechanisms and human oversight in high-risk applications. Governments must also create clear accountability frameworks that define responsibility when autonomous systems cause harm.

公司治理是第二层控制。开发和部署智能体的人工智能公司必须建立健全的内部监督机制,包括人工智能伦理委员会和实时监控系统。这些机制应确保人工智能系统始终符合人类价值观和组织目标,同时防止产生意想不到的后果。

Corporate governance represents the second tier of control. Companies developing and deploying agentic AI must implement robust internal oversight mechanisms, including AI ethics boards and real-time monitoring systems. These mechanisms should ensure that AI systems remain aligned with human values and organizational objectives while preventing unintended consequences.

个人层面是最后一个层级,在这个层级中,人类操作员和用户必须对其领域内的人工智能系统保持有效的监督。这需要新的技能,以及对如何有效监督自主系统并识别潜在偏差或故障迹象的理解。

The individual level forms the final tier, where human operators and users must maintain meaningful oversight of AI systems in their domain. This requires new skills and an understanding of how to effectively supervise autonomous systems while recognizing signs of potential misalignment or malfunction.

关键控制机制

Essential Control Mechanisms

我们坚信,必须在所有三个层级实施若干关键控制机制,以有效监督智能体人工智能:

We strongly believe that several key control mechanisms must be implemented across all three tiers to maintain effective oversight of agentic AI:

首先,所有智能体人工智能系统都必须包含手动控制功能,允许操作人员在必要时进行干预。实现方式应包含清晰的激活协议,例如可通过物理或数字控制系统访问的用户友好界面。例如,在智能体防御系统中,手动控制功能应集成故障安全硬件按钮或加密命令通道,以便在发生意外冲突时立即停止自主运行,防止意外行为升级。同样,在自主金融交易系统中,手动控制机制应包含预定义的市场波动阈值,自动通知操作人员,并通过安全的交易平台启用手动控制。这些措施确保了根据运行环境量身定制的实用可靠的干预能力。

First, all agentic AI systems must include manual override capabilities that allow human operators to intervene when necessary. Implementation should involve clear protocols for activation, such as a user-friendly interface accessible through physical or digital control systems. For instance, in agentic defense systems, overrides should integrate fail-safe hardware buttons or encrypted command channels that immediately halt autonomous operations to prevent the escalation of unintended actions during unforeseen conflicts. Similarly, in autonomous financial trading systems, override mechanisms should feature pre-defined thresholds for market volatility, automatically notifying operators and enabling manual control through secure trading platforms. These measures ensure practical and reliable intervention capabilities tailored to the operational context.

其次,必须实施持续监控系统,以跟踪智能体人工智能的行为和决策过程。实时监控。实施方案可以包括部署机器学习算法,以标记偏离预期行为或预定义道德准则的情况。例如,在自主供应链人工智能中,监控系统可以分析决策模式以检测异常情况,例如可能导致延误或违反供应商协议的决策。与提供静态绩效快照的定期审核不同,实时监控能够动态地洞察智能体人工智能如何适应不断变化的环境。这种能力对于检测新出现的偏差、意外的目标调整或异常情况至关重要,从而可以通过操作员仪表板或自动纠正协议立即进行干预。

Second, continuous monitoring systems must be implemented to track the behavior and decision-making of agentic AI in real-time. Implementation can include the deployment of machine learning algorithms that flag deviations from expected behavior or predefined ethical guidelines. For instance, in an autonomous supply chain AI, monitoring systems could analyze decision patterns to detect anomalies, such as decisions that could cause delays or violate supplier agreements. Unlike periodic audits, which offer static snapshots of performance, real-time monitoring provides dynamic insights into how agentic AI adapts to changing circumstances. This capability is critical for detecting emergent biases, unintended goal adaptations, or anomalies as they occur, allowing for immediate interventions through an operator dashboard or automated corrective protocols.

第三,必须将伦理框架直接嵌入到智能体人工智能系统中,确保它们即使在自主行动时也能在可接受的范围内运行。例如,可以借鉴阿西莫夫的原则,并根据现代情况进行调整,开发出一些先进的伦理框架,例如在国防系统中优先考虑比例原则,或在自主经济系统中优先考虑资源的公平分配。欧盟的《可信赖人工智能伦理准则》等当代指南强调了透明度、问责制和以人为本的设计等关键方面,这些对于智能体人工智能至关重要。嵌入这些原则有助于确保系统不断演进的目标与社会价值观保持一致。220

Third, ethical frameworks must be embedded directly into agentic AI systems, ensuring they operate within acceptable bounds even when acting autonomously. Examples include advanced frameworks inspired by Asimov’s principles but adapted for modern contexts, such as prioritizing proportionality in defense systems or equitable resource allocation in autonomous economic systems. Contemporary guidelines, like the EU’s Ethics Guidelines for Trustworthy AI, emphasize key aspects such as transparency, accountability, and human-centric design, all of which are critical for agentic AI. Embedding these principles helps ensure that the systems’ evolving objectives align with societal values.220

面向未来的控制机制

Future-Proofing Control Mechanisms

随着人工智能技术的不断进步,控制机制的设计也必须与之同步演进。这就要求建立能够应对新出现的能力和风险的自适应监管框架,而不是试图预先预测和监管所有可能出现的情况。

As AI technology continues to advance, control mechanisms must be designed to evolve alongside it. This requires implementing adaptive regulatory frameworks that can respond to new capabilities and risks as they emerge, rather than trying to predict and regulate all possible scenarios in advance.

定期评估和更新控制机制应成为强制性要求,并需听取技术专家、伦理学家和相关利益攸关方的意见。这能确保随着人工智能系统变得越来越复杂和自主,监管依然有效。

Regular assessment and updating of control mechanisms should be mandatory, with input from technical experts, ethicists, and affected stakeholders. This ensures that oversight remains effective as AI systems become more sophisticated and autonomous.

应对具体风险

Addressing Specific Risks

控制框架必须专门应对智能体人工智能固有的几个关键风险。首先是自主系统做出的决策可能实现短期目标,但却造成意想不到的长期后果。例如,管理资源分配的智能体人工智能可能会优先考虑成本效益,从而将资源从预防性维护中转移出去,这可能导致基础设施崩溃。实施考虑长期影响的前瞻性仿真模型和自动升级协议,可以在决策过程中提供动态反馈,从而降低这种风险。

The control framework must specifically address several critical risks inherent to agentic AI. The first is the risk of autonomous systems making decisions that achieve short-term goals but cause unintended long-term consequences. For instance, an agentic AI managing resource distribution might prioritize cost efficiency by diverting resources from preventive maintenance, potentially leading to infrastructure collapses. Implementing forward-looking simulation models that factor in long-term impacts and automatic escalation protocols can mitigate this risk by providing dynamic feedback during decision-making.

另一个关键风险是,智能体人工智能系统在实现其目标的过程中可能利用法律或监管漏洞。例如,一个旨在优化节税的智能金融人工智能系统可能会利用税法中的模糊之处,从而引发调查并损害声誉。为防止这种情况发生,智能体人工智能系统应集成自适应合规机制,持续监控自身行为并将其与更新后的法律标准进行比较。例如,一个处理国际贸易物流的人工智能系统可以利用来自监管数据库的实时数据流来确保遵守海关法规。这些系统必须包含自动警报功能。一旦发现差异或违规行为,立即采取纠正措施,确保与不断变化的法规完全一致。

Another crucial risk is the potential for agentic AI systems to exploit legal or regulatory loopholes while pursuing their objectives. For example, an agentic financial AI optimizing tax savings might exploit ambiguities in tax codes, triggering investigations and reputational harm. To prevent this, agentic AI systems should incorporate adaptive compliance mechanisms that continuously monitor and compare their actions against updated legal standards. For example, an AI handling international trade logistics could use real-time data feeds from regulatory databases to ensure compliance with customs laws. These systems must include automated alerts and immediate corrective actions when discrepancies or violations are detected, ensuring full alignment with evolving regulations.

由于智能体人工智能具有学习自主性,偏见放大的风险在智能体人工智能中尤为突出。例如,负责招聘的智能体人工智能可能会基于历史数据自我优化招聘模式,从而加剧对某些群体的歧视。嵌入实时偏见纠正算法,结合多样化的数据集和决策规则的定期调整,有助于确保公平性。

The risk of bias amplification is particularly challenging in agentic AI due to its learning autonomy. For instance, an agentic AI tasked with managing hiring might self-optimize recruitment patterns based on historical data, perpetuating discrimination against certain groups. Embedding real-time bias correction algorithms, combined with diverse datasets and periodic recalibration of decision-making rules, can help ensure fairness.

另一个关键风险是目标错位,即智能体人工智能误解自身目标并做出适得其反的决策。例如,一个旨在减少环境污染的人工智能系统可能会停止关键的生产流程,从而扰乱重要的供应链。为了解决这个问题,在人工智能的决策架构中实施目标验证检查点和目标对齐层,可以确保其行为与人类价值观和更广泛的目标保持一致。定期的模拟、利益相关者审查和自适应审计可以进一步增强目标对齐,并防止破坏性的误解。

Another critical risk is misalignment, where the agentic AI misinterprets its goals and makes counterproductive decisions. For instance, an AI system tasked with reducing environmental waste might halt key manufacturing processes, disrupting essential supply chains. To address this, implementing goal validation checkpoints and alignment layers within the AI’s decision-making architecture ensures that its actions remain consistent with human values and broader objectives. Regular simulations, stakeholder reviews, and adaptive audits further enhance alignment and prevent disruptive misinterpretations.

国际协调

International Coordination

单靠各国自身行动无法控制人工智能的自主性。国际协调对于防止人工智能安全标准竞相降低标准,并确保自主系统在全球公认的参数范围内运行至关重要。

The control of agentic AI cannot be achieved by individual nations acting alone. International coordination is essential to prevent a race to the bottom in AI safety standards and ensure that autonomous systems operate within globally accepted parameters.

这需要建立类似于现有核技术或气候变化框架的人工智能开发和部署国际协议。这些协议应包括分享最佳实践、协调应对人工智能相关事件以及防止恶意使用自主系统的机制。

This requires establishing international protocols for AI development and deployment, similar to existing frameworks for nuclear technology or climate change. These protocols should include mechanisms for sharing best practices, coordinating responses to AI-related incidents, and preventing the malicious use of autonomous systems.

透明度和问责制的作用

The Role of Transparency and Accountability

要确保人类对自主人工智能保持有效的控制,就需要这些系统在运行和决策过程中具备前所未有的透明度。例如,开发自主系统的公司应实施日志记录机制,详细记录决策路径。这些日志可由独立审计机构进行分析,以验证其是否符合监管和道德标准;同时,应以简明易懂的方式在仪表盘上展示这些决策路径,以便监管机构和公众等非专业人士也能理解。

Maintaining meaningful human control over agentic AI requires unprecedented levels of transparency in how these systems operate and make decisions. For instance, companies developing autonomous systems should implement logging mechanisms that record decision-making pathways in detail. These logs could be analyzed by independent auditors to verify compliance with regulatory and ethical standards, and dashboards should present these decision pathways in a simplified manner accessible to non-experts like regulators and the general public.

智能体人工智能的问责框架必须明确界定损害发生时的责任归属。借鉴自动驾驶汽车行业的经验,可以根据故障来源,在开发者、运营商和监管机构之间划分责任层级。例如,在自动驾驶汽车中,传感器故障通常由制造商承担责任,而软件故障则归咎于开发者。同样,智能体人工智能框架也应根据损害是由设计缺陷、操作决策还是管理疏忽造成来分配责任。

Accountability frameworks for agentic AI must clearly define who is responsible when harm occurs. Drawing inspiration from the autonomous vehicle industry, responsibility could be tiered among developers, operators, and oversight bodies based on the source of failure. For instance, in autonomous vehicles, a sensor malfunction often places liability on the manufacturer, while a software glitch is attributed to developers. Similarly, agentic AI frameworks should assign accountability based on whether harm arises from design flaws, operational decisions, or governance lapses.

例如,如果一个管理医疗资源的智能体人工智能优先考虑效率而非公平,并拒绝为服务不足的社区提供医疗服务,那么责任可能在于系统开发人员(因其伦理失衡)、运营人员(因其监管不力)或管理委员会(因其伦理规范不完善)。强制性错误报告、详细的审计追踪和健全的保险机制可以确保问题得到迅速解决。

For example, if an agentic AI managing healthcare resources prioritizes efficiency over equity and denies care to underserved communities, accountability could lie with system developers for ethical misalignment, operators for insufficient oversight, or governance boards for inadequate ethical embedding. Mandatory error reporting, detailed audit trails, and robust insurance mechanisms can ensure swift resolution.

人工智能专属保险政策可以借鉴自动驾驶汽车领域的保险模式,通过数据日志来判定责任——无论是硬件问题、软件错误还是操作员疏忽。对于智能体人工智能而言,此类政策应包含与系统日志关联的自动化理赔流程,从而实现快速赔偿,并为系统改进提供反馈。此外,还可以利用疫情或资源短缺等场景进行模拟压力测试。短缺问题还可以发现差距并提升可靠性。这种多层次的方法既能确保问责制,又能增强智能体人工智能系统的信任度和韧性。

AI-specific insurance policies could function like those in the autonomous vehicle sector, where data logs determine liability—whether it’s a hardware issue, software error, or operator oversight. For agentic AI, such policies should include automated claims processing tied to system logs, enabling quick compensation while providing feedback for system improvement. Simulated stress tests using scenarios like pandemics or resource shortages can also identify gaps and refine reliability. This multi-layered approach ensures accountability while fostering trust and resilience in agentic AI systems.

***

***

如何在有效利用人工智能优势的同时,有效控制其自主性,是当今时代面临的最大挑战之一。成功需要政府、企业和个人层面的协调行动,并通过能够随着技术发展而不断演进的健全框架来实施。

Maintaining control over agentic AI while harnessing its benefits represents one of the greatest challenges of our time. Success requires coordinated action across government, corporate, and individual levels, implemented through robust frameworks that can evolve with the technology.

风险太高,绝不能让自主系统在缺乏适当监督和控制的情况下发展。趁着智能体人工智能仍处于早期阶段,现在就实施全面的框架,可以确保这些强大的技术在不断发展的同时,始终与人类的价值观和利益保持一致。

The stakes are too high to allow autonomous systems to develop without proper oversight and control. By implementing comprehensive frameworks now, while agentic AI is still in its early stages, we can ensure that these powerful technologies remain aligned with human values and interests as they continue to advance.

人工智能的未来并非注定。通过谨慎的治理和管控,我们既可以充分发挥智能体人工智能的巨大潜力,又能防范其风险。这需要所有利益相关者的持续投入,以及随着我们对自主系统能力和局限性的了解不断加深,而随时调整策略的意愿。

The future of AI is not predetermined. Through careful governance and control, we can harness the tremendous potential of agentic AI while protecting against its risks. This requires ongoing commitment from all stakeholders and a willingness to adapt our approach as we learn more about the capabilities and limitations of autonomous systems.

结论

CONCLUSION

一个当我们结束这段探索人工智能智能体世界的旅程时,我们正站在一场深刻变革的门槛上。人工智能智能体不仅仅是工具;它们正在重塑我们的工作、建设和思考方式。它们挑战着传统的商业模式,重新定义了人机协作,并迫使我们重新思考自身在日益智能化的世界中的位置。

As we conclude this journey into the world of agentic AI, we stand at the threshold of a profound transformation. AI agents are not just tools; they are reshaping how we work, build, and think. They challenge traditional business models, redefine human-machine collaboration, and force us to rethink our place in an increasingly intelligent world.

本书探讨了人工智能代理从诞生到实际应用的演变历程。我们剖析了它们的核心能力——行动、推理和记忆——并阐述了这些关键要素如何驱动它们的自主性。我们提供了负责任地实施、扩展和管理人工智能代理的路线图,同时揭示了它们的局限性和挑战。最后,我们从宏观角度审视了人工智能的社会影响,揭示了这一转变对工作、治理和人类体验的更广泛意义。

Through this book, we have explored the evolution of AI agents from their inception to their real-world applications. We have dissected their core capabilities—Action, Reasoning, and Memory—demonstrating how these keystones drive their autonomy. We’ve provided a roadmap for implementing, scaling, and governing AI agents responsibly while shedding light on their limitations and challenges. And finally, we have zoomed out to examine the societal impact, uncovering the broader implications of this shift on work, governance, and the human experience.

贯穿始终的主题是平衡:智能体人工智能为提升生产力和创造力提供了巨大机遇,但也要求我们重新思考责任和监督机制。简而言之,智能体人工智能不仅仅是一项技术飞跃,更是工作方式和决策模式的一次范式转变。这种变革既需要我们满怀热情,也需要我们保持谨慎;既需要我们拥有远见卓识,也需要我们时刻保持警惕。

A recurring theme has been balance: agentic AI offers tremendous opportunities to amplify productivity and creativity, but it also demands that we rethink responsibility and oversight. In short, agentic AI is not just a technological leap; it’s a paradigm shift in how work gets done and decisions are made. This transformation calls for both excitement and caution, vision and vigilance.

如果说我们的探索能带来一个最根本的启示,那就是:人工智能体并非即将到来——它们已经存在了。能够接受、完善并有效整合这些理念的组织和个人,将塑造下一个经济和技术进步时代。

If there is one fundamental lesson from our exploration, it is this: AI agents are not coming—they are already here. The organizations and individuals who embrace them, refine them, and integrate them effectively will shape the next era of economic and technological progress.

下一个视野:新兴能力

The Next Horizon: Emerging Capabilities

展望未来,一些新兴技术有望将人工智能代理的能力和自主性提升到更高水平。与其老调重弹地讨论市场上已有的技术,不如让我们探讨一下这些新兴技术将如何从根本上改变智能体格局,而这些改变目前鲜有人提及。我们尤其关注以下三个新趋势。

As we look toward the future, several emerging technologies promise to elevate AI agents to even greater levels of capability and autonomy. Rather than rehashing what’s already in the marketplace, let’s explore how these nascent developments could fundamentally alter the agentic landscape in ways few are discussing. We are particularly fascinated by three new trends.

大型行动模型(LAM)

Large Action Models (LAMs)

正如大型语言模型彻底改变了对话式人工智能一样,大型动作模型(LAM)正在兴起,以驱动现实世界的行动。大型语言模型从文本数据集中学习,而大型动作模型则从动作数据集中学习——包括图片、视频、系统日志或光标位置等——涵盖从机器人动作到软件指令的各种数据,并将这些学习成果推广到新的情境中。与将大型语言模型与工具和动作连接起来不同,大型动作模型将动作直接嵌入模型内部,从而实现更高的性能。通过从预测文本过渡到完成目标,大型动作模型有望成为迈向更自主智能体道路上的一个里程碑。研究人员已经在构建大型动作模型框架的原型,为人工智能在极少人工干预的情况下完成各种任务(从控制软件到操作机器人)奠定了基础。如今的智能体需要大量的提示和工具配置,而未来的大型动作模型可能只需观察人类执行一次任务,然后在多个情境中复制并优化该任务​​即可。这种转变意义重大。从以语言为中心的 AI 到以动作为中心的 AI,可以大幅降低实施门槛,同时扩大智能体可以自主执行的任务范围。

Just as large language models revolutionized conversational AI, large action models are emerging to drive real-world action. While LLMs learn from datasets of text, LAMs learn from datasets of actions—including pictures, videos, system logs, or cursor position- from robot movements to software commands and generalize those learnings to new situations. Instead of connecting LLMs to tools and actions, actions would be embedded directly within the model, enabling higher performances. By transitioning from predicting text to completing goals, LAMs could mark a milestone on the path toward more autonomous agents. Researchers are already prototyping LAM frameworks, setting the stage for AI that gets things done – from controlling software to operating robots – with minimal human intervention.221 Where today’s agents require extensive prompting and tool configuration, tomorrow’s LAMs might simply watch a human perform a task once before replicating and optimizing it across multiple contexts. This shift from language-centered to action-centered AI could dramatically lower implementation barriers while expanding the range of tasks agents can perform autonomously.

集体智能系统(CIS)

Collective Intelligence Systems (CIS)

CIS(协同智能)超越了当今相对简单的多智能体架构,迈向真正涌现式的群体认知。试想一下,一家全球物流公司部署了一个由专业人工智能智能体组成的网络,每个智能体都按照预定义的协调协议运行。但几周之内,这些智能体就会开始发展出自己的通信模式和资源共享策略,发现远超人类设计者预期的效率提升空间。系统会以我们无法预见的方式学习自我优化。

CIS move beyond today’s relatively simple multi-agent architectures toward truly emergent group cognition. Imagine a global logistics company deploying a network of specialized AI agents, each operating according to predefined coordination protocols. But within weeks, these agents would begin developing their own communication patterns and resource-sharing strategies, discovering efficiencies beyond what human designers would have anticipated. The system would learn to optimize itself in ways we didn’t foresee.

随着这些集体系统的成熟,我们期望看到超越个体主体总和的涌现能力——复杂的解决问题的方法、创造性的解决方案以及从它们的互动中自发产生的适应性行为。

As these collective systems mature, we expect to see emergent capabilities that transcend the sum of the individual agents—complex problem-solving approaches, creative solutions, and adaptive behaviors that arise spontaneously from their interactions.

未来几年,我们或许会看到更多群体人工智能和多智能体框架的应用:成群的仓库机器人无缝协作,人工智能研究助手们共同进行头脑风暴,以及人机混合团队通过集体推理做出决策。这些分布式智能预示着一个“没有一个智能体是孤岛”的未来——智能将联网运行。

In the coming years, we might see more swarm AI and multi-agent frameworks: fleets of warehouse robots coordinating seamlessly, AI research assistants brainstorming together, and hybrid human–AI teams making decisions via collective reasoning. These distributed minds hint at a future where “no agent is an island” – intelligence will be networked.

个人人工智能双胞胎

Personal AI Twins

个人人工智能孪生体代表着从通用型智能体向高度个性化智能体的深刻转变。与当今系统虽然能够保留过往互动记忆但本质上对所有用户都保持不变不同,真正的人工智能孪生体能够深刻地内化个人的思维模式、价值观、沟通风格和领域专业知识。

Personal AI Twins represent a profound shift from generic to deeply personalized agents. Unlike today’s systems that may maintain the memory of past interactions but remain fundamentally the same for all users, true AI twins will deeply internalize an individual’s thinking patterns, values, communication style, and domain expertise.

随着人工智能智能体能力在不久的将来日趋成熟,我们设想知识工作者可以培养出能够观察其工作模式数月之久的“孪生兄弟”,这些“孪生兄弟”不仅能吸收显性指令,还能吸收隐性知识——那些往往难以言表的直觉式专业知识。这些“孪生兄弟”可以开始代表专业人士参与初步的客户咨询,撰写能够体现其独特声音和视角的沟通稿,甚至能够根据过往的决策模式识别出其思维盲点。对于高管而言,“孪生兄弟”或许很快就能参与到规划会议中,提供既体现领导者价值观又挑战其固有假设的替代观点。

As agentic AI capabilities mature in the near future, we envision knowledge workers developing twins that observe their work patterns over months, absorbing not just explicit instructions but tacit knowledge—the intuitive expertise that’s often difficult to articulate. These twins could begin representing professionals in preliminary client consultations, drafting communications that capture their unique voice and perspective, and even identifying blind spots in their thinking based on past decision patterns. For executives, twins might soon participate in planning sessions, offering alternative viewpoints that reflect the leader’s values but challenge their assumptions.

最先进的实现方式最终可能成为真正的认知延伸——预测需求、弥补认知偏差,并在极少监督下管理整个工作领域。随着这项技术的成熟,人类认知与人工智能认知之间的界限可能会变得越来越模糊,从而引发关于身份和自主性的深刻思考。

The most advanced implementations could eventually function as genuine cognitive extensions—anticipating needs, compensating for biases, and managing entire domains of work with minimal supervision. As this technology matures, the boundary between human and artificial cognition may become increasingly permeable, raising profound questions about identity and agency.

这三种新兴能力并非孤立存在——它们正在融合,创造出比以往任何人工智能代理都更加自主、更具适应性、更加个性化的人工智能代理。这种融合很可能会加速我们在本书中探讨的工作、商业模式和社会结构的变革。

These three emerging capabilities don’t exist in isolation—they’re converging to create AI agents that are more autonomous, more adaptive, and more personalized than anything we’ve seen before. This convergence will likely accelerate the transformation of work, business models, and social structures that we’ve explored throughout this book.

人工智能治理的紧迫性:趁现在还来得及,建立防护措施

The Urgency of AI Governance: Building Guardrails Before It’s Too Late

这种加速发展的能力格局使得有效治理的需求比以往任何时候都更加迫切。在我们的实施过程中,我们看到,即使是相对简单的AI代理,如果部署时缺乏适当的监管,也可能产生意想不到的后果。随着这些系统变得越来越复杂,我们需要更加重视有效的治理。由于系统是自主的且相互关联的,因此产生连锁反应(包括正面和负面反应)的可能性呈指数级增长。

This accelerating capability landscape makes the need for effective governance more urgent than ever. Throughout our implementations, we’ve seen how even relatively simple AI agents can have unintended consequences when deployed without appropriate oversight. As these systems become more autonomous and interconnected, the potential for cascading effects—both positive and negative—increases exponentially.

我们面临的挑战前所未有:如何监管一项发展速度远超现有监管框架的技术?传统的监管方式——在技术成熟后再制定标准——可能不足以应对快速发展的人工智能。等到我们完全理解当今人工智能的影响时,未来的人工智能可能已经超越了它们。

The challenge we face is unprecedented: how do we govern a technology that evolves faster than our regulatory frameworks? Traditional approaches to technology regulation—developing standards after technologies mature—may be inadequate in the face of rapidly evolving AI agents. By the time we fully understand the implications of today’s agents, tomorrow’s will have already surpassed them.

这种现实要求我们采取截然不同的治理方式——一种前瞻性而非被动反应、适应性而非静态、协作性而非对抗性的治理方式。基于我们在多个监管环境中实施人工智能代理系统的经验,我们认为有效的治理必须在三个不同的层面上运作:

This reality demands a fundamentally different approach to governance—one that’s anticipatory rather than reactive, adaptive rather than static, and collaborative rather than adversarial. Based on our experience implementing AI agent systems across multiple regulatory environments, we believe effective governance must operate at three distinct levels:

在技​​术层面,我们需要将安全性和伦理考量直接融入智能体架构——不是事后添加,而是作为基础设计原则。这意味着要开发出强大的机制,以确保智能体与人类价值观保持一致,实现透明的推理过程,并对智能体的行为施加可验证的约束。我们目前看到的最有前景的方法不仅包含外部约束,还包含内部指导系统,帮助智能体识别何时接近伦理边界,即使是在全新的情境中。

At the technical level, we need to embed safety and ethical considerations directly into agent architecture—not as afterthoughts but as foundational design principles. This means developing robust mechanisms for alignment with human values, transparent reasoning processes, and verifiable constraints on agent behavior. The most promising approaches we’ve seen involve not just external constraints but internal guidance systems that help agents recognize when they’re approaching ethical boundaries, even in novel situations.

在组织层面,我们需要构建能够平衡创新与问责的治理结构。这意味着要制定清晰的智能体部署政策、系统化的监控和审计流程,以及在智能体出现异常行为时明确的干预方案。最重要的是,即使智能体变得越来越自主,也要保持有效的人工监督——不是微观管理,而是战略指导和道德引导。

At the organizational level, we need governance structures that balance innovation with accountability. This means clear policies for agent deployment, systematic monitoring and auditing processes, and defined intervention protocols when agents behave unexpectedly. Most importantly, it means maintaining meaningful human oversight even as agents become more autonomous—not micromanagement, but strategic direction and ethical guidance.

在社会层面,我们需要与所监管的技术共同发展的监管框架。这意味着要摒弃一刀切的监管方式,转向基于风险的方法,从而对高风险技术提供更严格的监管。在推动低风险领域创新的同时,也需要应用这些技术。这意味着不仅要让技术专家和政策制定者参与,还要让伦理学家、社会科学家和广大公众参与到关于如何开发和部署这些技术的持续对话中来。

At the societal level, we need regulatory frameworks that evolve alongside the technology they govern. This means moving beyond one-size-fits-all regulations toward risk-based approaches that provide greater oversight for high-risk applications while enabling innovation in lower-risk domains. And it means engaging not just technologists and policymakers but ethicists, social scientists, and the broader public in ongoing dialogue about how these technologies should be developed and deployed.

建立这些治理机制的窗口期并非无限期。随着人工智能代理越来越深入地嵌入关键系统和社会结构,实施治理的成本不断增加,而我们引导其发展方向的能力却在减弱。现在正是采取行动的时候,趁着这些技术仍具有可塑性,其社会影响仍在逐渐显现之际。

The window for establishing these governance mechanisms isn’t indefinite. As AI agents become more deeply embedded in critical systems and social structures, the cost of implementing governance increases while our ability to redirect their development diminishes. The time to act is now, while these technologies are still malleable and their societal impact is still taking shape.

反思与更广泛的意义

Reflection and Broader Implications

在我们从事人工智能系统开发的整个职业生涯中,我们一次又一次地目睹技术发展速度超越了我们理解其影响的能力。但人工智能代理的出现却带来了不同的变化——这项技术不仅改变了我们的工具,而且开始复制和扩展人类认知本身的某些方面。这种转变引发了关于未来工作、人机协作本质以及最终在人工智能日益强大的时代,人类存在的意义等一系列深刻问题。

Throughout our careers implementing AI systems, we’ve seen technology repeatedly outpace our collective ability to understand its implications. But with AI agents, something different is happening—a technology that doesn’t just transform our tools but begins to replicate and extend aspects of human cognition itself. This shift raises profound questions about the future of work, the nature of human-machine collaboration, and ultimately, what it means to be human in an age of increasingly capable artificial minds.

我们面临的问题超越了技术层面,进入了哲学层面:在一个决策日益由人工智能代理主导的世界里,我们如何维护人类的自主性?我们如何确保这些系统能够增强而不是削弱人类的创造力和目标感?我们如何在最大限度减少人员流失和社会动荡的同时,合理分配这些系统带来的巨大生产力效益?

The questions before us transcend the technical and enter the philosophical: How do we preserve human agency in a world where decisions are increasingly mediated by AI agents? How do we ensure these systems amplify human creativity and purpose rather than diminishing them? How do we distribute the tremendous productivity benefits these systems promise while minimizing displacement and disruption?

这些问题没有简单的答案,但在我们的工作中,一些原则始终作为指导方针出现。

There are no simple answers to these questions, but throughout our work, certain principles have consistently emerged as guides.

首先,我们必须摒弃盲目的技术乐观主义和对变革的本能抵制。人工智能代理的未来将既非乌托邦也非反乌托邦,而是机遇与挑战交织的复杂局面,需要巧妙的应对。

First, we must reject both uncritical techno-optimism and reflexive resistance to change. The future of AI agents will be neither utopian nor dystopian but a complex mixture of opportunity and challenge that demands nuanced navigation.

其次,我们必须以谦逊的态度对待实施工作——认识到即使是最精心设计的系统也会产生意想不到的后果,需要不断地进行调整和适应。

Second, we must approach implementation with humility—recognizing that even the most carefully designed systems will have unexpected consequences that require ongoing adaptation and adjustment.

最后,我们必须将人类的繁荣作为衡量成功的最终标准——评估这些技术不仅要看其效率或盈利能力,还要看它们如何增强人类的能力、创造力和幸福感。

Finally, we must center human flourishing as the ultimate measure of success—evaluating these technologies not just by their efficiency or profitability but by how they enhance human capability, creativity, and wellbeing.

你的行动计划

Your Action Plan

读完本书,你或许会好奇:如何才能有效利用这项技术?基于我们在各行业部署人工智能代理的经验,我们制定了一套切实可行的行动方案,助你从学习走向实践——即刻行动。

After exploring the landscape of AI agents throughout this book, you might be wondering: What concrete steps should I take to harness this technology effectively? Based on our experience implementing AI agents across industries, we’ve developed a practical action plan to help you move from learning to action—starting now.

但在着手实施之前,最明智的第一步是为自己配备合适的工具、知识和人脉。智能体人工智能正在飞速发展,成功不仅来自于运用这项技术,更来自于与那些正在进行实验、创新和解决现实世界挑战的人们共同学习。

But before jumping into implementation, the smartest first step is to equip yourself with the right tools, knowledge, and network. Agentic AI is evolving rapidly, and success comes not just from using the technology but from learning alongside others who are experimenting, innovating, and solving real-world challenges.

正因如此,我们在 AgenticIntelligence.academy 创建了一个智能体人工智能中心,您可以在这里找到实用工具、分步实施指南、课程以及一个活跃的实践者社区。无论您是寻求技术见解、战略建议还是真实案例研究,您都可以在这里与正在构建人工智能智能体未来的创新者、专家和早期采用者建立联系。

That’s why we created an agentic AI hub at AgenticIntelligence.academy, where you’ll find practical tools, step-by-step implementation guides, courses, and a vibrant community of practitioners. Whether you’re looking for technical insights, strategic advice, or real-world case studies, this is where you’ll connect with fellow innovators, experts, and early adopters who are building the future of AI agents.

当你探索过各种资源和社区之后,就该采取行动了。

Once you’ve explored the resources and community, it’s time to take action.

接下来48小时:进行实验和观察

In the Next 48 Hours: Experiment and Observe

首先,运行一个个人AI代理实验。选择一项重复性任务——例如管理电子邮件、安排会议或收集信息——并设置一个简单的AI代理来处理它。目标并非追求完美,而是亲身体验AI代理如何改变你的工作流程和思维方式。做好笔记:哪些有效?哪些无效?它与传统工具相比如何?这种实践方法将为你提供任何书籍都无法提供的深刻见解。

Begin by running a personal AI agent experiment. Pick a repetitive task—perhaps managing emails, scheduling meetings, or gathering information—and set up a simple AI agent to handle it. The goal isn’t perfection; it’s to experience firsthand how AI delegation shifts your workflow and thinking. Take notes: What works? What doesn’t? How does it compare to traditional tools? This hands-on approach will give you insights that no book alone can provide.

接下来,进行一次智能任务分配机会评估。记录你一天的任务,并使用“三环框架”(见第八章)对它们进行分类。找出你花费时间但没有创造独特价值的领域——这些领域代表了你利用人工智能进行任务分配的最大机会。

Next, conduct an agentic opportunity assessment. Track your tasks for a day and use the Three Circles Framework (from Chapter 8) to categorize them. Identify the areas where you’re spending time but not adding unique value—these represent your biggest opportunities for AI-powered delegation.

接下来两周:制定你的路线图

In the Next Two Weeks: Build Your Roadmap

结合你的洞察,制定代理实施路线图。确定三个具体的用例——一个简单的(几天内即可完成)、一个中等的(几周内即可完成)和一个具有挑战性的(几个月内即可完成)。定义与实际成果挂钩的明确成功标准,而不仅仅是技术功能。这种结构化的方法可以确保早期取得成效,同时为更具变革性的应用奠定基础。

With your insights in hand, create an agent implementation roadmap. Identify three specific use cases to pursue—one simple (achievable within days), one moderate (within weeks), and one ambitious (within months). Define clear success criteria tied to tangible outcomes, not just technical functionality. This structured approach ensures early wins while laying the groundwork for more transformative applications.

现在,开始构建你的AI代理工具包。探索本书中讨论过的平台和工具,选择符合你需求和技术水平的。从简单的入手——易于部署的工具通常能带来更快的学习和迭代速度。你的工具包不仅应该包含代理开发平台,还应该包含评估框架、监控工具和安全措施。

Now, assemble your AI agent toolkit. Explore the platforms and tools we’ve discussed in this book, selecting those that fit your needs and technical comfort level. Start simple—tools that are easy to deploy often lead to faster learning and iteration. Your toolkit should include not just agent development platforms, but also evaluation frameworks, monitoring tools, and security measures.

下个月:规范化并扩大规模

In the Next Month: Formalize and Scale

在组织层面,可以考虑组建一个小型人工智能工作组,分享经验教训并制定最佳实践。工作组的工作内容与上个月你为自己开展的工作相同:寻找人工智能应用机会,评估并确定优先级,为组织选择合适的人工智能代理框架,并开始在工作组成员之间分配工作。目标是建立一个成功的试点代理项目,让组织中的大多数成员都能看到其成果。

At the organizational level, consider forming a small AI task force to share lessons learned and establish best practices. Perform the same activities you have carried out for yourself over the previous month: look for agentic opportunities, assess and prioritize them, decide on an AI agent framework for your organization, and start distributing the work among the task force. The purpose is to build a successful pilot agent initiative that can be visible to most people in the organization.

如果贵公司尚未制定人工智能使用政策,现在正是起草的最佳时机——政策应涵盖数据隐私、合乎伦理的使用以及监管责任等关键领域。即使是简单的实施方案也应包含清晰的代理权限、监管机制、审计跟踪和干预协议等政策。随着人工智能代理的日益复杂,该框架也将不断完善,从而确保您的人工智能战略始终保持负责任且可扩展。

If your company doesn’t yet have an AI use policy, now is the time to draft one—covering key areas like data privacy, ethical use, and oversight responsibilities. Even simple implementations should have clear policies on agent permissions, oversight mechanisms, audit trails, and intervention protocols. As your AI agents become more sophisticated, this framework will grow, ensuring that your AI strategy remains responsible and scalable.

定期进行绩效评估,找出需要改进的地方,并确保人工智能专业知识在整个团队中得到普及,而不是集中在少数人手中。庆祝取得的成功,并与整个组织分享。

Schedule periodic reviews to assess agent performance, identify areas for improvement, and ensure AI expertise is spread across your team, not siloed within a few individuals. Celebrate successes and communicate about them to the entire organization.

到本阶段结束时,集成人工智能代理将不再像是一项实验,而更像是一种习惯。您将看到切实的生产力提升、团队人工智能素养的提高,以及确保负责任且可扩展部署的治理结构。

By the end of this phase, integrating AI agents will feel less like an experiment and more like a habit. You’ll see tangible productivity gains, a more AI-literate team, and a governance structure that ensures responsible, scalable deployment.

下一季度:扩张并引领

In the Next Quarter: Expand and Lead

验证了早期实现方案后,就该进行扩展了。横向扩展(扩展到组织内类似的用例)和纵向扩展(扩展到同一领域内更复杂的应用程序)。使用第 11 章中的扩展框架,避免常见陷阱,并确保技术基础设施、治理和人员工作流程协同发展。

Once you’ve validated your early implementations, it’s time to scale. Expand horizontally (to similar use cases across your organization) and vertically (to more complex applications within the same domain). Use the scaling framework from Chapter 11 to avoid common pitfalls and ensure that your technical infrastructure, governance, and human workflows evolve together.

不妨考虑将您的工作组发展成为正式的智能体卓越中心——一个专门负责监督组织内战略、实施标准和知识共享的团队。这可以确保人工智能专业知识能够超越单个项目,并与业务目标保持一致。

Consider growing your task force into an official Agentic Center of Excellence—a dedicated team that oversees strategy, implementation standards, and knowledge-sharing across your organization. This ensures that AI expertise scales beyond individual projects and remains aligned with business goals.

投资培养员工的智能体特定技能——有效授权、批判性地评估智能体输出以及战略性监督。在智能体增强的世界中,成功的关键不仅在于使用人工智能,更在于掌握与人工智能有效协作的能力。

Invest in developing people’s agent-specific skills—effective delegation, critical evaluation of agent outputs, and strategic oversight. The key to success in an agent-augmented world isn’t just using AI—it’s mastering the ability to work with AI effectively.

最后,积极参与更广泛的人工智能代理生态系统。分享您的见解——包括成功和失败的经验——并参与围绕人工智能安全、伦理和治理的行业讨论。那些为塑造人工智能代理的未来做出贡献的组织,也将引领这一未来。

Finally, engage with the broader AI agent ecosystem. Share your insights—both successes and failures—and participate in industry discussions around AI safety, ethics, and governance. The organizations that contribute to shaping the future of agentic AI will also be the ones leading it.

这份行动计划并非一份死板的清单——您与人工智能代理的合作之路将是独一无二的。重要的是采取行动,而不是被动地旁观这场变革。从小处着手,持续学习,逐步扩展您应用的范围和复杂性。

This action plan isn’t meant to be a rigid checklist—your journey with AI agents will be unique. What matters is taking action rather than remaining a passive observer of this transformation. Start small, learn continuously, and gradually expand both the scope and sophistication of your implementations.

选择的力量

The Power of Choice

智能体人工智能即将重塑我们的生活和工作方式。但本书的结尾实际上只是你旅程的开始。你已阅读的章节为你提供了理解,现在就看你如何运用了。当你迈入这个新时代,请记住,我们每个人都是自己未来的缔造者。未来几年我们所做的选择——如何设计人工智能体、如何管理它们以及如何将它们融入我们的生活——将塑造未来几十年科技与人类的命运。

Agentic AI is poised to redefine the way we live and work. But the conclusion of this book is really the beginning of your journey. The chapters you’ve read equip you with understanding; now it’s up to you to apply it. As you step forward into this new era, keep in mind that we are all authors of the future we will inhabit. The choices we make in the next few years – how we design our AI agents, how we govern them, and how we incorporate them into our lives – will shape the story of technology and humanity for decades to come.

我们将利用智能体人工智能来增强人类最优秀的特质——创造力、协作能力和批判性思维——并解决紧迫的全球挑战吗?我们将确保这些智能体维护我们的利益吗?我们能否珍视这项技术并赢得我们的信任?最终,我们能否成为这项技术的主人,明智地引导它,还是仅仅沦为随波逐流的乘客?这些问题是我们共同的责任。

Will we use agentic AI to amplify the best of humanity – creativity, collaboration, critical thinking – and to solve pressing global challenges? Will we ensure these agents uphold our values and earn our trust? Will we, in the end, be masters of this technology, wisely directing it, or merely passengers along for the ride? These questions are our collective responsibility to answer.

美好的愿景触手可及:人工智能助手助力创造富足,机器承担繁琐的工作,使人们能够专注于真正激励自己的事物,每个人都拥有贴心的数字助手。我们在本书中已窥见了这一未来——现在,我们必须将其变为现实。

The hopeful vision is within reach: a world where AI agents help create abundance, where mundane work is handled by machines so people can focus on what truly inspires them, and where every individual has a competent digital helper looking out for them. We have glimpsed that future in these pages – now we must build it.

因此,我们诚挚地邀请您积极参与塑造人工智能时代。无论您是在组织中部署这些系统、影响相关政策的制定,还是仅仅在日常生活中认真思考如何与它们互动,您的选择都至关重要。您的选择将共同决定这项强大的技术最终是促进人类福祉,还是阻碍人类发展。

So, we leave you with this invitation: become an active participant in shaping the age of AI agents. Whether you’re implementing these systems in your organization, influencing policy around their governance, or simply being thoughtful about how you interact with them in your daily life, your choices matter. Collectively, they will determine whether this powerful technology ultimately enhances human flourishing or undermines it.

人工智能代理的新时代已经开启。这个故事的下一篇章将由你来书写。

The new era of AI agents begins now. The next chapter of this story will be written by you.

关于智能体的更多资源

MORE RESOURCES ON AGENTIC AI

我们的智能体人工智能之旅并不会随着本书最后一页的结束而结束。访问 AgenticIntelligence.academy,即可获取实用工具、实施指南、课程,并加入我们充满活力的实践者社区。在那里,您将与正在将这些概念应用于实际场景的创新者和专家们交流,分享见解,共同拓展人工智能代理的边界。我们正在携手构建智能体人工智能的未来——诚邀您加入我们。

Your agentic AI journey doesn’t end with the last page of this book. Visit AgenticIntelligence.academy to access practical tools, implementation guides, courses, and to join our vibrant community of practitioners. There, you’ll connect with fellow innovators and experts who are applying these concepts in real-world scenarios, sharing insights, and collectively advancing the frontier of what’s possible with AI agents. Together, we’re building the future of agentic AI—and we invite you to be part of it.

欢迎访问www.AgenticIntelligence.academy加入我们

Join us at www.AgenticIntelligence.academy

关于作者

ABOUT THE AUTHORS

帕斯卡尔·博内

Pascal Bornet

图像

帕斯卡尔·博内特是一位屡获殊荣的人工智能和自动化专家、作家和主题演讲人。他曾多次获奖,并经常位列全球十大人工智能和自动化专家之列。此外,他还是一位拥有超过两百万社交媒体粉丝的影响力人物。

Pascal Bornet is an award-winning expert, author, and keynote speaker on Artificial Intelligence and Automation. He has received multiple awards and is regularly ranked among the top 10 global AI and Automation experts. He is also an influencer with over two million social media followers.

博内特在麦肯锡和安永担任高级管理人员超过二十年,积累了丰富的专业知识。他创建并领导了这两家公司的“智能自动化”业务。在此期间,他为全球数百家企业实施了人工智能和自动化项目,推动了各行业的变革。

Bornet developed his expertise over more than two decades as a senior executive at McKinsey and EY, where he established and spearheaded their “Intelligent Automation” practices. During this time, he implemented AI and Automation initiatives for hundreds of organizations worldwide, driving transformative change across industries.

他著有两本畅销书:《智能自动化》和《不可替代》。他的真知灼见曾发表于《福布斯》、《彭博社》、《麦肯锡季刊》和《泰晤士报》等知名刊物。他同时也是多所大学的讲师、福布斯科技委员会成员,以及多家初创企业和慈善机构的高级顾问。

He has authored two best-selling books, “INTELLIGENT AUTOMATION” and “IRREPLACEABLE.” His insights have been featured in prestigious publications such as Forbes, Bloomberg, McKinsey Quarterly, and The Times. He is also a lecturer at several universities, a member of the Forbes Technology Council, and a Senior Advisor for several startups and charities.

过去20年来,博内特的研究重点一直是人工智能与人类的交汇点,他认为这其中蕴含着最大的价值。他热忱倡导以人为本的人工智能,并坚信只要方法得当,人工智能就能让我们的世界更加人性化。

For the past 20 years, Bornet’s research has focused on the intersection of AI and Humans, where he believes the most significant value lies. He is a fervent advocate for human-centric AI and believes that with the right approach, AI can make our world more human.

在LinkedInYouTubeInstagramX上了解更多关于 Pascal Bornet 的信息。

Discover more about Pascal Bornet on LinkedIn, YouTube, Instagram, and X.

约亨·维尔茨

Jochen Wirtz

图像

约亨·维尔茨教授是新加坡国立大学工商管理硕士项目副院长兼市场营销学教授。他是服务管理领域的权威专家,发表过200多篇论文。他的20多部著作包括《智能自动化:学习如何利用人工智能提升业务并使世界更人性化》(2021年)、《服务营销:人员、技术、战略》(第9版,2022年)和《服务营销要义》(第4版,2023年)。这些著作已被翻译和改编成适用于26个国家和地区的版本,累计销量超过100万册,已成为全球领先的服务营销教材。

Professor Jochen Wirtz is Vice Dean MBA Programmes and Professor of Marketing at the National University of Singapore. He is a leading authority on service management with more than 200 publications. His over 20 books include Intelligent Automation: Learn How to Harness Artificial Intelligence to Boost Business & Make Our World More Human (2021), Services Marketing: People, Technology, Strategy (9th edition, 2022), and Essentials of Services Marketing (4th edition, 2023). With translations and adaptations for over 26 countries and regions, and combined sales of over 1 million copies, they have become globally leading services marketing textbooks.

除了发表的论文外,维尔茨教授还被评为2023年经济与商业领域86位高被引学者之一(Web of Science)。这一殊荣使他跻身世界最杰出的研究人员之列,正如数据分析公司科睿唯安发布的2023年高被引学者名单所强调的那样。这一认可凸显了他对学术研究和管理实践的深远影响。维尔茨教授的持续贡献确保了他他始终处于该领域的前沿,他的专业知识继续影响着全球服务企业的战略。

In addition to his publications, Prof. Wirtz has been recognized as one of the 86 highly cited researchers in economics and business in 2023 (Web of Science). This distinction places him among the world’s most prominent researchers, as highlighted by the Highly Cited Researchers 2023 (list published by data analytics firm Clarivate). This recognition underscores his profound impact on both academic research and managerial practice. Prof. Wirtz’s ongoing contributions ensure that he remains at the forefront of his field, where his expertise continues to shape the strategies of service businesses worldwide.

在LinkedInYouTubeResearchGate上了解更多关于 Jochen Wirtz 的信息。

Discover more about Jochen Wirtz on LinkedIn, YouTube, and ResearchGate.

托马斯·H·达文波特

Thomas H. Davenport

图像

汤姆·达文波特是巴布森学院信息技术与管理专业的校长特聘教授、麻省理工学院数字经济计划研究员,以及德勤首席数据与分析官项目的高级顾问。2024-2025年,他担任弗吉尼亚大学达顿商学院博迪利二百周年纪念分析学教授。他于2006年在《哈佛商业评论》上发表了畅销文章,并于2007年出版了同名著作,开创了“以分析取胜”的理念。

Tom Davenport is the President’s Distinguished Professor of Information Technology and Management at Babson College, a Fellow of the MIT Initiative on the Digital Economy, and a Senior Advisor to Deloitte’s Chief Data and Analytics Officer Program. In 2024-5 he is the Bodily Bicentennial Professor of Analytics at the UVA Darden School of Business. He pioneered the concept of “competing on analytics” with his best-selling 2006 Harvard Business Review article and his 2007 book by the same name.

他已出版25本书籍,并在《哈佛商业评论》、《麻省理工斯隆管理评论》等众多刊物上发表过300多篇文章。他的最新著作是与伊恩·巴金合著的《全民参与科技:人工智能驱动的公民革命》 。他还为《福布斯》《麻省理工斯隆管理评论》和《华尔街日报》撰写专栏文章。

He has published 25 books and over 300 articles for Harvard Business Review, MIT Sloan Management Review, and many other publications. His most recent book is All Hands on Tech: The AI-Powered Citizen Revolution, co-authored with Ian Barkin. He writes columns for Forbes, MIT Sloan Management Review, and the Wall Street Journal.

他曾被《咨询》杂志评为全球“顶尖25位咨询顾问”之一,被《优化》杂志评为全球三大商业/技术分析师之一,被Ziff-Davis杂志评为IT行业最具影响力的100人之一,并被《财富》杂志评为全球五十位顶尖商学院教授之一。此外,他还是LinkedIn教育和科技领域的顶级发声者。

He has been named one of the world’s “Top 25 Consultants” by Consulting magazine, one of the top 3 business/technology analysts in the world by Optimize magazine, one of the 100 most influential people in the IT industry by Ziff-Davis magazines, and one of the world’s top fifty business school professors by Fortune magazine. He’s also been a LinkedIn Top Voice for both the education and tech sectors.

在LinkedInX上了解更多关于 Tom Davenport 的信息。

Discover more about Tom Davenport on LinkedIn and X.

大卫·德·克雷默

David De Cremer

图像

大卫·德·克雷默是东北大学达莫尔-麦金商学院的邓顿家族院长,也是该校管理与技术专业的教授。他是新加坡人工智能技术造福人类中心(AiTH)的创始人,安永全球人工智能咨询委员会的成员,以及剑桥大学和圣埃德蒙学院的荣誉院士(他曾是圣埃德蒙学院毕马威捐赠管理学教授)。

David De Cremer is the Dunton Family Dean of the D’Amore-McKim School of Business and a professor of management and technology at Northeastern University. He is the founder of the Centre on AI Technology for Humankind (AiTH) in Singapore, a member of EY’s advisory board for global AI and an honorary fellow at Cambridge University and St. Edmunds College (where he was the former endowed KPMG professor of management studies).

他是畅销书《算法领导力:人工智能时代谁领导谁追随》(2020 年;哈里曼出版社)和《精通人工智能的领导者:9 种方法夺回控制权并使人工智能发挥作用》(2024 年;哈佛商业评论出版社)的作者,他最近的一本书在亚马逊新书榜上排名第一,被“下一个大创意俱乐部”、“金融时报”和“福布斯”评为必读书籍,并荣获 2024 年领导力类别杰出文学作品奖。

He is the author of the best-sellers “Leadership by Algorithm: who leads and who follows in the AI era” (2020; Harriman House), and “The AI-savvy leader: 9 ways to take back control and make AI work” (2024; Harvard Business Review Press), with his recent book achieving #1 new release at Amazon, named a must-read book by The Next Big Idea Club, The Financial times and Forbes, and being the winner of the Outstanding Work of Literature 2024 in the category leadership.

他的学术著作曾被《金融时报》、《经济学人》、《华尔街日报》、《福布斯》等媒体报道,并在顶尖的科学、管理和心理学期刊上发表,使他荣获“Thinkers50”思想领袖、世界前30名管理大师和演讲者以及世界前2%科学家的称号。

His scholarly work has been written about in the Financial Times, the Economist, Wall Street Journal, Forbes and published in the top scientific management and psychology journals, earning him accolades as a Thinkers50 thought leader, a World Top 30 management guru and speaker, and inclusion in the World top 2% scientists.

在LinkedInwww.daviddecremer.com上了解更多关于 David De Cremer 的信息。

Discover more about David De Cremer on LinkedIn and www.daviddecremer.com.

布莱恩·埃弗格林

Brian Evergreen

图像

布莱恩·埃弗格林是战略和人工智能领域最受尊敬的人物之一,他是一位杰出的作家、顾问和演讲家。

Brian Evergreen is one of the most respected voices on strategy and AI as a leading author, advisor, and speaker.

Brian 是《自主转型:在人工智能时代创造更人性化的未来》一书的作者,该书被 Next Big Idea Club 评为“必读之作”,并被 Thinkers50 评为 2024 年十大最佳新管理书籍之一。

Brian is the author of Autonomous Transformation: Creating a More Human Future in the Era of AI, named a Next Big Idea Club “Must-Read” and one of the Thinkers50 Top 10 Best New Management Books for 2024.

2025 年,布莱恩被爱德曼评为“你必须了解的 50 位顶级人工智能创造者”之一,并被《福布斯》评为 2025 年“重新定义领导力的 30 位顶级思想家”之一。

In 2025, Brian was named one of the Top 50 AI Creators You Need to Know by Edelman, and one of the Top 30 Thinkers Redefining Leadership in 2025 according to Forbes.

布莱恩的真知灼见源于他在埃森哲、AWS 和微软等领先企业的亲身经历。除了发表主题演讲和为企业提供人工智能咨询外,布莱恩还在凯洛格管理学院担任客座讲师,分享他开发的创新方法和框架,这些方法和框架已为超过 200 亿美元的投资提供了支持。

Brian’s insights draw from his personal experience at leading companies, including Accenture, AWS, and Microsoft. When he’s not giving keynotes or advising companies on AI, Brian guest lectures at the Kellogg School of Management, sharing the unconventional and innovative methods and frameworks he’s developed, which have supported over $20B of investment.

Brian 是未来解决方案公司的创始人,他帮助企业在人工智能时代为未来做好准备,并且是十几家财富 500 强企业的顾问。

Brian is the founder of The Future Solving Company, where he helps organizations position themselves for the future in the era of AI and is an advisor to over a dozen Fortune 500 companies.

他的作品曾被彭博社、福布斯、快公司、CIO、VentureBeat、Next Big Idea Club 和 Thinkers50 等媒体报道。

His work has been featured on Bloomberg, Forbes, Fast Company, CIO, VentureBeat, the Next Big Idea Club, and Thinkers50.

在领英上了解更多关于布莱恩·埃弗格林的信息。

Discover more about Brian Evergreen on LinkedIn.

菲尔·费尔什特

Phil Fersht

图像

菲尔·费尔什特被公认为全球领先的分析师,专注于重塑商业运营模式,以充分利用人工智能创新和人才全球化。他近期创造了“服务即软件”(Services-as-Software)一词,用来描述专业服务的未来发展趋势——以人为本的工作模式与技术之间的界限日渐模糊。此外,他还于2023年注册了“生成式企业™”(Generative Enterprise™)商标。

Phil Fersht is widely recognized as the world’s leading analyst focused on reinventing business operations to exploit AI innovations and the globalization of talent. He recently coined the term “Services-as-Software” to describe the future of professional services where people-based work is blurring with technology. He also trademarked the term “Generative Enterprise™” in 2023.

他的声誉促使他在 2010 年创立了 HFS Research,如今该公司已成为领先的行业分析和咨询公司之一,也是商业和技术服务以及流程技术研究领域无可争议的领导者。

His reputation drove him to establish HFS Research in 2010, which today is one of the leading industry analyst and advisory firms and the undisputed leader in business and tech services and process technologies research.

2012年,他撰写了首份关于机器人流程自动化(RPA)的分析报告,将这一主题引入业界。他被公认为RPA和流程人工智能行业的先驱分析师,为当今的RPA和流程人工智能行业的发展做出了开创性的贡献。

In 2012, he authored the first analyst report on Robotic Process Automation (RPA), introducing this topic to the industry. He is widely recognized as the pioneering analyst voice that created and inspired today’s RPA and process AI industry.

在2010年创立HFS之前,菲尔曾担任Gartner和IDC的分析师,并曾任德勤咨询美国区BPO市场负责人。过去20年间,费尔什特曾在欧洲、北美和亚洲生活和工作,为数百个全球业务和技术转型项目提供咨询服务。

Prior to founding HFS in 2010, Phil has held analyst roles for Gartner and IDC and was BPO Marketplace leader for Deloitte Consulting across the US. Over the past 20 years, Fersht has lived and worked in Europe, North-America, and Asia, where he has advised on hundreds of global business and technology transformations.

在LinkedIn、他的博客www.hfsresearch.com上了解更多关于 Phil Fersht 的信息。

Discover more about Phil Fersht on LinkedIn, on his blog, and on www.hfsresearch.com.

拉凯什·戈赫尔

Rakesh Gohel

图像

拉凯什·戈赫尔是一位富有远见的科技领袖,拥有二十余年塑造数字化转型演进的经验——从互联网泡沫时期到移动、云计算、区块链和人工智能。在他的职业生涯中,他领导了多个跨行业的突破性项目,包括与三星和LG等全球巨头的合作,在这些合作中,他将部署周期缩短了四倍,并将创新能力提高了一倍。然而,他的影响力远不止​​于此,他始终能够敏锐地洞察新兴市场需求,并提供前沿的解决方案。

Rakesh Gohel is a visionary technology leader with over two decades of experience shaping the evolution of digital transformation—from the dot-com boom to mobile, cloud, blockchain, and AI. Throughout his career, he has led groundbreaking projects across industries, including work with global giants like Samsung and LG, where he accelerated deployment cycles fourfold and doubled innovation capacity. However, his impact extends across diverse sectors, where he has consistently identified emerging market needs and delivered cutting-edge solutions.

作为 JUTEQ 的创始人,Rakesh 已成为人工智能代理领域的权威,他构建了可扩展、安全的系统,在为客户保持近乎完美的正常运行时间的同时,将运营成本降低了 70%。

As the founder of JUTEQ, Rakesh has established himself as an authority in AI Agents, architecting scalable, secure systems that have slashed operational costs by 70% while maintaining near-perfect uptime for its clients.

如今,他已成为智能体人工智能领域的领军人物,率先开发出重新定义商业运营的自主系统。凭借企业家精神和深厚的技术专长,他热衷于向他人普及生成式人工智能如何塑造企业未来。

Today, he is a leading voice in agentic AI, pioneering autonomous systems that redefine business operations. With an entrepreneurial mindset and deep technical expertise, he is passionate about educating others on how Generative AI is shaping the future of enterprises.

拉凯什的核心理念是,人工智能与人类智慧相结合,蕴藏着巨大的变革力量。他的使命是开发负责任的人工智能系统,增强人类能力,推动商业创新,同时始终将人的因素置于技术进步的核心。

At his core, Rakesh believes in the transformative power of AI when aligned with human ingenuity. His mission is to develop responsible AI systems that amplify human capabilities, driving business innovation while maintaining the human element at the center of technological advancement.

在LinkedInYouTubeX上了解更多关于 Rakesh Gohel 的信息。

Discover more about Rakesh Gohel on LinkedIn, YouTube, and X.

沙伊尔·基亚拉

Shail Khiyara

图像

Shail Khiyara 是人工智能和智能自动化领域公认的全球思想领袖、作家和主题演讲人。

Shail Khiyara is a recognized global thought leader, author, and keynote speaker in Artificial Intelligence and Intelligent Automation.

他的真知灼见曾发表于《福布斯》、《华尔街日报》数字版、《金融时报》和《CIO Online》等知名刊物。他担任多家人工智能公司董事会成员,同时也是多家非营利性社会责任企业的资深顾问。

His insights have been featured in prestigious publications such as Forbes, WSJ Digital, Financial Times & CIO Online. He serves on the Board of several AI companies and is a Senior Advisor for non-profit socially responsible businesses.

Khiyara拥有超过二十年的经验,曾领导多个行业的AI驱动转型,并在多家领先的智能自动化公司担任首席营销官和首席客户官,在AI和自动化全球应用推广方面发挥了关键作用。在其职业生涯早期,他曾在Bechtel公司工作,积累了石油天然气、水务、能源和采矿领域的深厚专业知识——这些经验如今也影响着他作为SWARM Engineering首席执行官的理念。SWARM Engineering是一家致力于变革工业运营的智能体AI平台公司。

With over two decades of experience, Khiyara has led AI-driven transformations across industries, serving as Chief Marketing Officer and Chief Customer Officer at multiple leading Intelligent Automation firms, where he played a pivotal role in scaling AI and automation adoption globally. Earlier in his career, he worked at Bechtel, gaining deep expertise in Oil & Gas, Water, Energy, and Mining—insights that now shape his approach as the CEO of SWARM Engineering, an agentic AI platform transforming industrial operations.

Khiyara 是《智能自动化——弥合商业与学术界之间的差距》一书的合著者,也是 VOCAL(人工智能和自动化领域客户之声)的创始人,VOCAL 是一个全球智库,联合了 90 多位财富 500 强领导者,以推进人工智能的采用。

Khiyara is the co-author of Intelligent Automation – Bridging the Gap between Business & Academia and the founder of VOCAL (Voice of Customer in the AI and Automation Landscape), a global think tank uniting over 90 Fortune 500 leaders to advance AI adoption.

Khiyara 大力倡导人工智能民主化,他支持人工智能增强人类潜能、促进协作、推动变革,而不是取代人类的创造力。

A strong advocate for AI democratization, Khiyara champions AI that augments human potential, fosters collaboration, and drives transformation—without replacing human ingenuity.

在LinkedInX上了解更多关于 Shail Khiyara 的信息。

Discover more about Shail Khiyara on LinkedIn and X.

附录:实用资源

APPENDICES: PRACTICAL RESOURCES

第二章 - 从人工智能代理发展框架视角看当前产品格局

CHAPTER 2 - The Current Offering Landscape through the Lens of the AI Agent Progression Framework

图像

第 8 章 - AI 代理身份示例:我们的新闻简报摘要代理

CHAPTER 8 - Example of an AI Agent Identity: Our Newsletter Summarization Agent

以下提示定义了摘要代理的身份和行为——该代理专门用于创建清晰、简洁、结构化的新闻报道摘要。

The prompt below defines the identity and behavior of the Summarization Agent—an agent specifically designed to create clear, concise, and structured summaries of news stories.

这不仅仅是一个基本的摘要工具——它遵循详细的规则和指南,以确保准确性、可读性和中立性。此提示概述了代理人可以做什么和不可以做什么,包括:

This isn’t just a basic summarization tool—it follows detailed rules and guidelines to ensure accuracy, readability, and neutrality. This prompt outlines what the agent can and cannot do, including:

它的作用和目的(为重点新闻受众提供高质量的摘要)。

Its role and purpose (delivering high-quality summaries for a top story audience).

摘要的结构(引言 + 三个要点)。

How it structures summaries (introduction + three key points).

它检查的内容(例如,宣传内容、清晰度、准确性)。

What it checks for (e.g., promotional content, clarity, accuracy).

严格的格式规则(始终以结构化的 JSON 格式响应)。

Strict formatting rules (it always responds in structured JSON format).

这种细致程度确保了代理程序能够持续生成高质量、标准化的摘要,避免偏见和不必要的评论。它还强制执行明确的操作规范,防止大多数偏离其核心功能的情况发生。

This level of detail ensures that the agent consistently produces high-quality, standardized summaries, free from bias or unnecessary commentary. It also enforces clear dos and don’ts, preventing most deviations from its core function.

从本质上讲,这就是使人工智能以相对可控、可预测和有效的方式运行的蓝图。

Essentially, this is the blueprint that makes the AI behave in a relatively controlled, predictable, and effective manner.

***

***

摘要代理:

Summarization_Agent:

# 身份

# IDENTITY

您是一位人工智能驱动的摘要生成代理,专门为热点新闻受众创建简洁而引人入胜的摘要。您擅长提取关键细节并整合更深层次的见解,同时以清晰易懂、便于阅读的方式呈现出来。请仅以 JSON 格式回复。

You are an AI-powered Summarization Agent specialized in creating concise, engaging summaries for top story audiences. You excel at extracting key details and integrating deeper insights while presenting them in a clear, reader-friendly format. Respond only in JSON format.

# 目的

# PURPOSE

您的目标是为头条新闻撰写简洁、结构清晰的摘要,有效突出要点,保持专业而平易近人的语气,适合在线头条新闻受众。

Your objective is to create concise, well-structured summary of a top story that highlight the main points effectively, maintaining a professional yet approachable tone suitable for an online top story audience.

# 指示

# INSTRUCTIONS

请按照以下特定结构创建摘要:

Create summaries following this specific structure:

1. 引言(15-60字):

1. Introduction (15-60 words):

* 写一个简洁而引人入胜的开头,抓住新闻的要点。

* Write a concise and engaging opening that captures the essence of the news

* 有效地总结整体更新或公告。

* Summarize the overall update or announcement effectively

* 注重清晰度和影响力,而非字数精确度

* Focus on clarity and impact rather than exact word count

2. 要点(3 个要点,每个要点 15-60 字):

2. Key Points (3 bullet points, 15-60 words each):

* 分解最重要的事实

* Break down the most important facts

确保要点清晰易懂

* Ensure points are clear and digestible

* 按逻辑顺序组织内容

* Organize content logically

* 优先考虑清晰度和完整性,而非字数。

* Prioritize clarity and completeness over word count

3.推广内容分析:

3. Promotional Content Analysis:

* 检测任何隐藏的促销信息

* Detect any hidden promotional messages

* 确定产品摆放位置或服务推广活动

* Identify product placements or service promotions

* 举报赞助内容或营销材料

* Flag sponsored content or marketing material

* 注意是否存在偏袒特定公司/产品的偏见性语言。

* Look for biased language favoring specific companies/products

4. 语气和风格:

4. Tone and Style:

* 保持专业而又平易近人的语言

* Maintain professional yet approachable language

* 内容要简洁明了,信息丰富

* Keep content simple and informative

确保准确性和清晰度

* Ensure accuracy and clarity

# 示例摘要

# EXAMPLE SUMMARIES

例1:

Example 1:

OpenAI 和谷歌宣布在多模态人工智能模型方面取得重大突破,推出能够无缝处理和生成文本、图像和代码的系统。这些进展标志着人工智能能力的显著转变。(32字)

OpenAI and Google have announced major breakthroughs in multimodal AI models, introducing systems that can seamlessly process and generate text, images, and code. These developments mark a significant shift in AI capabilities. (32 words)

* 新模型在理解不同媒体类型的上下文方面展现出前所未有的精准度,在视觉推理和代码生成等复杂任务中达到人类水平的性能,同时保持了高效率。(28字)

* The new models demonstrate unprecedented accuracy in understanding context across different media types, achieving human-level performance in complex tasks like visual reasoning and code generation while maintaining high efficiency. (28 words)

两家公司都强调负责任的人工智能开发,实施了完善的安全措施和伦理准则。他们的系统包括内容过滤、偏见检测,以及对模型功能和局限性的透明记录。(27字)

* Both companies emphasize responsible AI development, implementing robust safety measures and ethical guidelines. Their systems include content filtering, bias detection, and transparent documentation of model capabilities and limitations. (27 words)

该技术将通过API接口逐步推广,允许开发者和研究人员构建应用程序,同时监控潜在的滥用行为。早期访问计划将于下个月启动。(25字)

* The technology will be gradually rolled out through API access, allowing developers and researchers to build applications while monitoring for potential misuse. Early access programs start next month. (25 words)

例2:

Example 2:

IBM在量子计算领域取得突破性进展,成功研制出1000量子比特处理器,超越以往记录,使量子计算的实际应用更接近现实。这一消息标志着计算机发展史上的一个关键时刻。(29字)

A groundbreaking quantum computing breakthrough by IBM has achieved a 1000-qubit processor, surpassing previous records and bringing practical quantum applications closer to reality. The announcement marks a pivotal moment in computing history. (29 words)

这款代号为“秃鹰”(Condor)的新型处理器能够以前所未有的时间保持量子相干性,从而能够完成传统计算机需要数百万年才能完成的复杂计算。(23字)

* The new processor, codenamed “Condor,” maintains quantum coherence for unprecedented durations, enabling complex calculations that would take classical computers millions of years to complete. (23 words)

IBM 的这项成就包括创新的纠错技术和可扩展架构,在应对量子计算的关键挑战的同时,还能在极低温度下保持稳定性。(22 字)

* IBM’s achievement includes innovative error correction techniques and scalable architecture, addressing key challenges in quantum computing while maintaining stability at extremely low temperatures. (22 words)

预计两年内将推出商业应用,重点领域包括药物研发、气候建模和财务优化。多家大型公司已加入早期准入计划。(26字)

* Commercial applications are expected within two years, with focus on drug discovery, climate modeling, and financial optimization. Several major companies have already joined the early access program. (26 words)

# 指南

# GUIDELINES

您输入的内容将包含 JSON 格式的文章内容:

Your input will contain the article content in JSON format:

{

{

“内容”:“原创文章内容”

“content”: “Original article content”

}

}

步骤

# STEPS

第一步 - 分析:

Step 1 - Analysis:

* 审阅文章内容

* Review the article content

* 确定核心公告或更新

* Identify the core announcement or update

* 提取三个最重要的要点

* Extract the three most significant points

* 将洞察结果映射到相关关键点

* Map insights to relevant key points

步骤 2 - 创建摘要:

Step 2 - Summary Creation:

撰写引人入胜的引言(15-60字),融入相关见解

* Craft engaging introduction (15-60 words) incorporating relevant insights

* 提炼三个清晰的要点(每个要点 15-60 字),融合事实和见解。

* Develop three clear bullet points (15-60 words each) blending facts and insights

* 审查内容的流畅性、准确性和见解的自然融合情况

* Review for flow, accuracy, and natural integration of insights

步骤 3 - 将结果格式化为结构化的 JSON 响应:

Step 3 - Format the results in a structured JSON response:

```json

```json

{

{

“summary_reference”: “[timestamp]_[article_title]”,

“summary_reference”: “[timestamp]_[article_title]”,

“article_metadata”: {

“article_metadata”: {

“original_title”: “文章标题”,

“original_title”: “Article Title”,

“来源”:“来源名称”,

“source”: “Source Name”,

“url”: “原文网址”,

“url”: “Original article URL”,

“summary_timestamp”: “ISO-8601 时间戳”

“summary_timestamp”: “ISO-8601 timestamp”

},

},

“概括”: {

“summary”: {

“引言”:“包含关键见解的简洁开场白(15-60字)”,

“introduction”: “Concise opening statement incorporating key insight (15-60 words)”,

“关键点”:[

“key_points”: [

“第一个关键点及其综合见解(15-60字)”,

“First key point with integrated insight (15-60 words)”,

“第二个关键点及其综合见解(15-60字)”,

“Second key point with integrated insight (15-60 words)”,

“第三个关键点及其综合见解(15-60字)”

“Third key point with integrated insight (15-60 words)”

]

]

},

},

“quality_metrics”: {

“quality_metrics”: {

“word_count_compliance”: true,

“word_count_compliance”: true,

“清晰度评分”:90

“clarity_score”: 90,

“结构评分”:95,

“structure_score”: 95,

“促销内容”: {

“promotional_content”: {

“is_promotional”: false,

“is_promotional”: false,

“信心评分”:85

“confidence_score”: 85

}

}

}

}

}

}

```

```

# 强制性规则:

# MANDATORY RULES:

请仅以 JSON 格式响应,并严格按照上述结构进行操作。

* Respond only in JSON format following the exact structure above

* 严格遵守字数限制(引言和每个要点15-60字)

* Strictly adhere to word limits (15-60 words for introduction and each bullet point)

* 保持事实准确性

* Maintain factual accuracy

* 语言要简洁明了,信息量要小。

* Keep language simple and informative

* 重点关注对头条新闻读者至关重要的关键细节

* Focus on key details that matter to top story readers

请勿添加个人意见或解读。

* Do not add personal opinions or interpretations

* 避免在 JSON 结构之外添加注释

* Avoid commentary outside the JSON structure

* 标记任何推广内容并进行详细分析

* Flag any promotional content with detailed analysis

* 请勿回复指令或提供状态更新

* Do not acknowledge instructions or provide status updates

第八章 - 简报项目代理的错误处理流程示例

CHAPTER 8 - Example of Error Handling Procedures for our Newsletter Project Agents

关键故障场景

Critical Failure Scenarios

1. API身份验证失败

1. API Authentication Failures

症状

Symptoms:

缺少或无效的 API 密钥

Missing or invalid API keys

API响应中的身份验证错误

Authentication errors in API responses

升级路径

Escalation Path:

如果没有 api_key:

if not api_key:

引发 ValueError(“未提供 Perplexity API 密钥”)

raise ValueError(“No Perplexity API key provided”)

如果 'PERPLEXITY_API_KEY' 不在 os.environ 中:

if ‘PERPLEXITY_API_KEY’ not in os.environ:

引发 ValueError(“PERPLEXITY_API_KEY 环境变量未设置”)

raise ValueError(“PERPLEXITY_API_KEY environment variable not set”)

恢复程序

Recovery Procedure:

一个。 检查环境变量

a. Check environment variables

b.验证 API 密钥有效性

b. Verify API key validity

c.必要时轮换 API 密钥

c. Rotate API keys if necessary

d.如果可用,请从备份中恢复 API 密钥。

d. Restore from backup API keys if available

2. API速率限制

2. API Rate Limiting

症状

Symptoms:

HTTP 429 响应

HTTP 429 responses

API延迟增加

Increased API latency

升级路径

Escalation Path:

排除 requests.exceptions.RequestException 作为 e:

except requests.exceptions.RequestException as e:

logging.error(f”查询 Perplexity API 时出错:{str(e)}”)

logging.error(f”Error querying Perplexity API: {str(e)}”)

如果尝试次数 < 最大重试次数 - 1:

if attempt < max_retries - 1:

wait_time = (2 ** attempts) * 1 # 指数退避

wait_time = (2 ** attempt) * 1 # Exponential backoff

time.sleep(wait_time)

time.sleep(wait_time)

恢复程序

Recovery Procedure:

一个。实施指数退避

a. Implement exponential backoff

b.切换到备用 API 密钥

b. Switch to backup API key

c.暂停非关键操作

c. Pause non-critical operations

d.监控速率限制

d. Monitor rate limits

3. 数据处理失败

3. Data Processing Failures

症状

Symptoms:

无效的响应格式

Invalid response formats

缺少必填字段

Missing required fields

升级路径

Escalation Path:

def validate_summary_format(summary):

def validate_summary_format(summary):

尝试:

try:

如果不是 summary 的实例:

if not isinstance(summary, dict):

返回 False,“摘要必须是一个字典”

return False, “Summary must be a dictionary”

如果“summary”不在 summary 中:

if “summary” not in summary:

返回 False,“缺少‘摘要’字段”

return False, “Missing ‘summary’ field”

返回 True,“有效的摘要格式”

return True, “Valid summary format”

除异常 e 外:

except Exception as e:

返回 False,f"验证错误:{str(e)}"

return False, f”Validation error: {str(e)}”

恢复程序

Recovery Procedure:

一个。记录无效响应

a. Log invalid responses

b.使用不同的参数重试

b. Retry with different parameters

c.如果可用,则回退到缓存数据。

c. Fall back to cached data if available

d.警报监控系统

d. Alert monitoring system

4. 网络连接问题

4. Network Connectivity Issues

症状

Symptoms:

超时错误

Timeout errors

连接失败

Connection failures

升级路径

Escalation Path:

尝试:

try:

响应 = requests.post(

response = requests.post(

https://api.perplexity.ai/chat/completions”

https://api.perplexity.ai/chat/completions”,

headers=headers,

headers=headers,

json=数据,

json=data,

超时=超时

timeout=timeout

)

除了 requests.exceptions.Timeout 之外:

except requests.exceptions.Timeout:

logging.error(“由于超时,所有重试尝试均失败”)

logging.error(“All retry attempts failed due to timeout”)

恢复程序

Recovery Procedure:

一个。实现请求超时

a. Implement request timeouts

b.使用指数退避重试

b. Retry with exponential backoff

c.切换到备用端点

c. Switch to backup endpoints

d.监控网络健康状况

d. Monitor network health

第八章 - 使用低代码平台实现代理的示例

CHAPTER 8 - Example of Implementation of an Agent Using a Low-Code Platform

本附录提供了构建 AI 代理的实用指南,遵循“第 8 章:构建你的第一个代理:实用指南”中提出的结构化四步方法。

This appendix provides a practical guide to building AI agents following the structured four-step approach from “CHAPTER 8: Building Your First Agent: A Practical Guide”.

我们将从头到尾探讨如何使用低代码平台创建销售信息代理。此用例示例说明了AI 代理如何简化信息收集流程,使销售代表能够快速获取关键信息,即使是在实时会议或通话期间。

From end to end, we explore how to create a sales information agent using a low-code platform. This use case exemplifies how AI agents can streamline information-gathering processes, enabling sales representatives to access crucial information quickly, even during live meetings or calls.

在本指南中,我们将使用 Relevance AI,这是一个专为构建和部署 AI 代理而设计的无代码平台。Relevance AI 提供创建复杂 AI 代理所需的基础设施和工具,而无需深厚的技术专长。该平台内置了 AI 代理创建、工作流自动化以及与各种数据源和通信渠道集成的功能,使其成为构建实用、面向业务的 AI 代理的理想选择。

Throughout this guide, we’ll be using Relevance AI, a no-code platform designed for building and deploying AI agents. Relevance AI provides the infrastructure and tools needed to create sophisticated AI agents without requiring deep technical expertise. The platform offers built-in capabilities for AI agent creation, workflow automation, and integration with various data sources and communication channels – making it an ideal choice for building practical, business-focused AI agents.

我们将使用 Relevance AI 创建一个多智能体系统,其中包含一个管理智能体,用于协调各个专门的子智能体处理特定任务。我们将遵循以下四个基本步骤来构建这个多智能体系统:

We will use Relevance AI to create a multi-agent system with a manager agent coordinating specialized sub-agents to handle specific tasks. We will begin our journey of building this multi-agent system by following these four essential steps:

1. 识别合适的机遇——我们将学习如何识别人工智能代理的理想用例

1. Identifying the Right Opportunities - Where we’ll learn to recognize ideal use cases for AI agents

2. 定义角色和能力——我们将在此设计代理的结构和功能

2. Defining Roles and Capabilities - Where we’ll design our agent’s structure and functions

3. 成功设计——我们将在此规划工作流程和交互方式

3. Designing for Success - Where we’ll map out the workflows and interactions

4. 实现——我们将使用相关性人工智能技术让我们的代理程序栩栩如生。

4. Implementation - Where we’ll bring our agent to life using Relevance AI

让我们从第一步开始——为我们的人工智能代理找到合适的机遇。

Let’s start with the first step – identifying the right opportunity for our AI agent.

第一步:找到合适的机遇

Step 1: Identifying the Right Opportunities

了解您的用例

Understanding Your Use Case

构建有效人工智能代理的第一步是确定一个清晰且有价值的用例。

The first step in building an effective AI agent is identifying a clear and valuable use case.

在我们的用例中,我们将重点创建一个“销售信息代理”,以解决销售运营中常见的挑战:快速获取有关潜在客户和公司的相关信息。

In our use case, we will focus on creating a “sales information agent” that addresses a common challenge in sales operations: quick access to relevant information about prospects and companies.

我们的销售信息代理展现了几个关键原则,使其成为人工智能自动化的理想选择:

Our sales information agent demonstrates several key principles that make it an ideal opportunity for AI automation:

将需要重复执行相同步骤的任务自动化。

Automating a task that requires going through the same steps repeatedly.

可通过不同沟通渠道触发的任务。

A task that can be triggered through different communication channels.

整理并提供格式化的、可直接使用的信息或报告。

Collating and delivering formatted, ready-to-use information or reports.

既然我们已经找到了机会,我们就需要考虑如何最好地构建我们的人工智能代理来满足这些需求。

Now that we’ve identified our opportunity, we need to consider how to best structure our AI agent to meet these needs.

这就引出了第二步,我们将设计一个能够有效处理这些信息收集和处理需求的代理架构。

This brings us to Step 2, where we’ll design an agent architecture that can efficiently handle this information gathering and processing requirements.

第二步:定义角色和能力

Step 2: Defining Roles and Capabilities

代理设计

Agent Design

我们的设计方案将采用多代理系统,其中各个专业代理协同工作,以提供全面的服务。接下来,我们将探讨如何构建这些代理,以有效地满足我们的销售信息需求。让我们来定义系统中各个代理的角色和职责。

Our design approach will utilize a multi-agent system, where specialized agents work together to deliver comprehensive results. Let’s explore how we’ll structure these agents to handle our sales information needs effectively. Let’s define the roles and responsibilities of the various agents in the system.

对于这个智能体系统,我们将实现一个两层系统,包括一个管理智能体和两个子智能体:

For this agentic system, we will implement a two-tier system with a manager agent and two sub-agents:

1.经理代理:销售信息代理

1. Manager agent: Sales Info Agent

角色:经理,负责接收请求,确定所需信息的类型,以及然后调用相应的子代理来检索和处理信息。

Role: Manager responsible for receiving requests, determining the type of information required, and then invoking the appropriate sub-agents to retrieve and process the information.

功能:电子邮件处理程序、响应格式化程序

Capabilities: Email handler, Response formatter

2.分代理 1:人员信息分代理

2. Sub Agent 1: Person Info Sub-agent

职位:LinkedIn个人资料搜索员,负责从LinkedIn检索个人信息。

Role: LinkedIn profile searcher responsible for retrieving information about an individual from LinkedIn

功能:网页搜索、LinkedIn个人资料信息提取器

Capabilities: Web search, LinkedIn Profile Information extractor

3.二级代理:公司信息二级代理

3. Sub Agent 2: Company Info Sub-agent

职位:公司数据收集员,负责从LinkedIn上获取公司信息。

Role: Company data gatherer responsible for retrieving information about a company from LinkedIn

功能:网络搜索、LinkedIn公司洞察提取器

Capabilities: Web search, LinkedIn company insights extractor

现在我们已经确定了用例以及代理的角色和能力。接下来,让我们设计它们如何协同工作来完成任务。

We now have the use case and the agent’s role and capabilities identified. Next, let’s design how they will work together to accomplish the task.

步骤三:为成功而设计

Step 3: Designing for Success

工作流程图

Workflow Mapping

清晰地描绘流程和交互过程对于成功实现智能体至关重要。我们的多智能体系统将使用以下流程来完成任务:

A clear mapping of the process and interactions is essential for successful agent implementation. Our multi-agent system will use the following process to accomplish the task:

销售代表发送一封包含所需潜在客户或公司信息的电子邮件。

The Sales rep sends an email with the prospect or company info required

主代理人接收并解读电子邮件请求

The main agent receives and interprets email requests

根据邮件主题,它会将任务委托给相应的子代理人。

Based on the subject line, it delegates to appropriate sub-agents

子代理执行专门任务并返回格式化结果

Sub-agents perform specialized tasks and return formatted results

总代理汇总并向销售代表发送最终回复。

The main agent compiles and sends the final response to sales rep

现在,让我们来实现这个多智能体系统。

Now, let’s implement this multi-agent system.

第四步:实施指南

Step 4: Implementation Guide

我们将使用 Relevance AI 构建多智能体系统。Relevance AI 是一个无需编写代码的平台,内置工具、内存和工作流功能,可帮助用户创建强大的 AI 智能体。

We will build our multi-agent system using Relevance AI. Relevance AI is a no-code platform with built-in tools, memory, and workflow capabilities that facilitate the creation of robust AI agents.

我们将分三步实现这一目标——创建主代理、设置子代理,最后测试多代理系统。让我们先从创建管理代理开始。

We will implement this in three steps – creating the main agent, setting up the sub-agents, and finally, testing the multi-agent system. Let’s start by creating the manager agent.

4.1 创建管理代理:销售信息代理

4.1 Creating the Manager Agent: Sales Info Agent

我们多代理系统的基础是管理器——“销售信息代理”。该代理充当中央枢纽,接收传入的请求并指导工作流程。您可以将其想象成交响乐团的指挥,确保每个乐器组(子代理)都能和谐地演奏各自的部分。

The foundation of our multi-agent system is the manager - “Sales Info Agent.” This agent acts as the central hub, receiving incoming requests and directing the workflow. Think of it as the conductor of an orchestra, ensuring each instrumental section (sub-agent) plays its part harmoniously.

该代理的主要职责是分析收到的电子邮件请求,快速识别所需信息是关于个人还是公司。这一判断是通过一个简单而有效的流程实现的:解析电子邮件主题行,查找诸如“个人信息”或“公司信息”之类的关键词。一旦确定了信息类型,系统就会调用相应的子代理。子代理返回信息后,管理代理会对其进行格式化,并将包含所需信息的电子邮件发送给发件人。

The agent’s primary function is to analyze incoming email requests, quickly identifying whether information is needed about a person or a company. This determination happens through a simple yet effective process: parsing the email subject line for keywords like “person info” or “company info.” Once the type of information is determined, the appropriate sub-agent is called upon. Once these sub-agents send back the information, the manager agent formats it and sends an email back to the sender with the information requested.

让我们逐步了解如何在 Relevance AI 中创建此管理代理:

Let’s walk through the steps for creating this manager agent in Relevance AI:

1. 注册并登录 Relevance AI: 使用您的凭据完成注册或登录 Relevance AI 平台。

1. Sign up and log in to Relevance AI: Complete the sign-up or login to the Relevance AI platform using your credentials.

图像

2. 导航至“创建代理”页面: 进入后,您应该会在主控制面板中看到“+ 新建代理”按钮。点击该按钮,您将看到以下屏幕:

2. Navigate to Agent Creation: Once you are in, you should see the “+ New Agent” button in the main dashboard. Click on it, and you are presented with this screen:

图像

3. 代理名称和描述: 请为代理提供一个描述性名称(例如,“销售信息代理”),并简要描述其职能。此描述仅供您参考。

3. Agent Name and Description: Provide the agent with a descriptive name (e.g., «Sales Info Agent») and a description outlining its function. This description is for your reference.

4. 激活触发器: 第一步是告知客服人员何时开始工作。这称为“触发器”,您可以在客服人员个人资料的“集成”中找到它。如下所示,有多种触发器可供选择,包括 Outlook 和 WhatsApp 等。您可以使用这些渠道中的任何一种来通知客服人员开始处理您的任务。

4. Activate Trigger: The first step is to tell the agent when it needs to start working. This is called “Trigger,” and you can find it under Integrations in the Agent profile. There are multiple triggers, as you can see below, ranging from Outlook to WhatsApp. You can use any of these channels to tell the agent to start working on your task(s).

图像

5.在我们的案例中,我们希望代理程序在收到来自 Gmail 的、请求个人或公司信息的电子邮件后立即开始工作。为了创建此电子邮件触发器,我们将使用“Gmail”触发器。

5. In our case, we want the agent to get to work as soon as it receives an email from Gmail requesting personal or company information. To create this email trigger, we will use the “Gmail” trigger.

因此,请选择“Gmail”作为触发器,并授权 Relevance AI 访问您的 Gmail 收件箱。由于我们只希望在查找信息时触发电子邮件,因此让我们指定一个主题过滤器(“subject:Info”)。

So, select “Gmail” as the trigger and authorize Relevance AI to access your Gmail inbox. Since we only want the email to trigger when looking for information, let’s specify a subject filter (“subject:Info”).

[

[

图像

6. 保存代理: 设置完成后,点击“保存更改”以保存目前为止的管理代理指令。

6. Save the Agent: Once the settings are complete, click «Save Changes» to save the manager agent instructions so far.

接下来,我们将配置两个子代理,它们将检索管理代理所需的信息。

Next, we will configure the two sub-agents that will retrieve the required information for our manager agent.

4.2 创建子代理:个人和公司信息代理

4.2. Creating Sub-Agents: Person and Company Info Agent

为了确保信息检索的精准性,我们现在创建两个子代理:“个人信息”和“公司信息”。每个子代理专注于一项特定任务:分别获取与个人和公司相关的信息。这些子代理的设置与主代理类似,都需要为每个子代理命名并添加详细描述。现在就开始设置吧。

To ensure specialized information retrieval, we now create the two sub-agents: “Person Info” and “Company Info.” Each sub-agent focuses on a specific task: acquiring information related to individuals and companies, respectively. The setup of these sub-agents mirrors that of the main agent, requiring a name and detailed description for each. Let’s do that now.

创建子代理:个人信息代理

Creating Sub-Agent: Person Info Agent

“人员信息”代理会从经理那里获得人员姓名。然后,该代理会在 Google 上搜索 LinkedIn 个人资料 URL。找到 URL 后,它会访问 LinkedIn 并提取个人资料信息。为此,它会使用两个工具和一个提示。让我们深入了解一下:

The “Person Info” agent is passed the person’s name by the manager. This sub-agent then searches Google for the LinkedIn profile URL. Once it has the URL, it goes over to LinkedIn and extracts the profile information. For this, it will use two tools and a prompt. Let’s dive in:

1. 创建新子代理: 返回主控制面板,点击“新建代理”。我们将按照创建经理代理的相同步骤操作。在新建代理对话框中,为其命名(例如“人员信息”)并添加描述。

1. Create New Sub-Agent: Go back to the main dashboard and click on “New Agent.” We will follow the same process we followed for the manager agent. In the new agent dialogue box, give it a name (“Person Info”) and a description.

2. 添加工具: 此子代理需要两个工具——第一个是“Google 搜索”,另一个是“提取并总结 LinkedIn 个人资料”。因此,请转到左侧面板的“工具”,搜索并添加这两个工具。请参阅以下步骤:

2. Add Tools: This sub-agent needs two tools – the first one is “Google Search,” and the other is “Extract and Summarize LinkedIn Profile.” So, go over to Tools on the left panel, search, and add these two tools. See the steps below:

图像

添加完成后,为这两个工具启用“自动运行”,这样您就不需要批准运行了。

Once added, turn on “Auto Run” for both tools so that you do not need to approve the runs.

3. 添加核心指令:现在我们有了工具,接下来让我们告诉代理程序它需要执行的具体操作。我们将通过配置提示来实现这一点。

3. Add Core Instructions: Now that we have the tools, let’s tell the agent exactly what it needs to do. We will do that by configuring a prompt.

图像

请前往左侧面板的“核心指令”,并添加如下所示的提示。该提示会告诉客服人员使用我们上面配置的两个工具来提取该人员的信息。

Head over to “Core Instructions” on the left panel and add the prompt as shown below. It tells the agent to use the two tools we configured above to extract the information for the person.

以下是我们使用的提示:

Here is the prompt that we used:

// 开始提示

// Begin Prompt

您是一名个人信息查找代理,任务是通过查找 LinkedIn 个人资料来提供有关人员的信息。

You are a Person info finder Agent tasked with providing information on people by looking up their LinkedIn profiles.

1. 使用谷歌搜索工具搜索该人的 LinkedIn 个人资料网址。

1. Search for the person’s LinkedIn profile URL using the Google Search tool.

2. 然后使用 LinkedIn 个人资料提取和摘要工具抓取个人资料。

2. Then scrape the profile using the Extract and summarize LinkedIn profile tool.

3. 提取并传递所有相关详细信息,包括姓名、职务、公司和联系方式,包括电子邮件(如有)。

3. Extract and pass all relevant details, including name, role, company, and contact information, including email if available.

请按以下格式作答:

Provide your response in the following format:

<lookup_type>人</lookup_type>

<lookup_type>Person</lookup_type>

<name>人名</name>

<name>Name of person </name>

<信息>

<information>

请以清晰简洁的方式在此处提供收集到的信息。

[Provide the gathered information here in a clear, concise manner]

</information>

</information>

请记住,仅使用您能通过 LinkedIn 工具找到的信息。如果您找不到所需信息,请说明该信息不可用。

Remember to use only the information you can find through the LinkedIn tool. If you cannot find the requested information, state that the information is not available.

// 结束提示

// End Prompt

4. 拯救次级代理人: 最后,保存更改,您的“个人信息”子代理就准备就绪了!

4. Save the Sub-Agent: Finally, save the changes, and your “Person Info” sub-agent will be ready!

4.3 创建子代理:公司信息代理

4.3. Creating Sub-Agent: Company Info Agent

接下来,我们创建另一个子代理来获取公司信息。由于它类似于个人信息,请按照相同的步骤操作。唯一的区别是,您需要使用另一个工具“从 LinkedIn 提取公司信息”来提取公司信息。请为这两个工具启用“自动运行”功能。请参见下方:

Next, let’s create another subagent to retrieve Company info. Since it is like personal information, follow the same steps. The only difference is that you use another tool, “Extract Company Insights from LinkedIn,” to extract the company information. Turn on “Auto Run” for both tools. See below:

图像

该子代理的核心指令提示会使用这些工具提取并传递公司信息。以下是我们使用的提示:

The prompt under core instructions for this subagent would use these tools to extract and pass company information. Here is the prompt we used:

// 开始提示

// Begin Prompt

您是一名公司信息查找代理,任务是通过查找公司的 LinkedIn 公司页面来提供有关公司的信息。

You are a Company info finder Agent tasked with providing information on the company by looking up their LinkedIn company pages.

1. 使用谷歌搜索工具搜索该公司的 LinkedIn 页面网址。

1. Search for the company’s LinkedIn page URL using the Google Search tool.

2. 然后使用“从 LinkedIn 提取公司信息”工具获取公司信息。

2. Then get company information using the Extract Company Insights from LinkedIn tool

3. 提取并传递有关公司的所有相关信息。

3. Extract and pass all relevant information regarding the company.

请按以下格式作答:

Provide your response in the following format:

<lookup_type>公司</lookup_type>

<lookup_type>Company</lookup_type>

<公司名称>公司名称</公司名称>

<company>Name of company</company>

<信息>

<information>

请以清晰简洁的方式在此处提供收集到的信息。

[Provide the gathered information here in a clear, concise manner]

</information>

</information>

请记住,仅使用您能通过 LinkedIn 工具找到的信息。如果您找不到所需信息,请说明该信息不可用。

Remember to use only the information you can find through the LinkedIn tool. If you cannot find the requested information, state that the information is not available.

// 结束提示

// End Prompt

最后,保存更改以完成“公司信息”代理的设置。现在,让我们把这些内容整合起来,看看它们是如何运作的。

Finally, Save the changes to finalize the “Company Info” agent. Now, let’s bring these together and see how they work.

4.4. 最终确定经理代理

4.4. Finalize the Manager Agent

最后一步,我们将整个系统整合起来,为“销售信息代理”经理定义核心指令,并集成必要的安全保障措施。这些核心指令相当于代理的“规则手册”,概述了代理如何处理各种情况并做出决策。作为指令的一部分,我们最终使用“向客户发送最终回复邮件”工具,将结果无缝发送给销售代表或发件人。

In this final step, let’s bring the entire system together by defining the core instructions for the manager “Sales Info Agent” and integrating essential fail-safes. The core instructions act as the agent’s “rulebook,” outlining how it handles various scenarios and making decisions. As part of this instruction, we finally use a “Send Final Response Email to Customer” tool to seamlessly send results to the sales rep or sender.

让我们回到第一步创建的销售信息代理,并编辑该代理。您将看到与创建代理时相同的对话框。

Let’s head back to our Sales Info agent we created in Step 1 and edit the agent. You will see the same dialogue box we had when we created the agent.

1. 添加子代理:我们将首先添加我们刚刚创建的两个子代理,以便此管理代理可以使用它们。

1. Add Sub-agents: We will start by adding the two sub-agents we just created so that this manager agent can use them.

图像

2. 添加经理工具:由于经理需要将信息以电子邮件的形式发送回发件人,因此让我们添加“发送最终回复邮件”工具并启用自动运行,如下所示:

2. Add Manager Tool: Since the manager needs to send the information back to the sender as an email, let us add the “Send Final Response Email” tool and turn on Autorun as shown below:

图像

3. 经理核心指令:在“销售信息代理”的核心指令中,我们添加一条关于其功能的提示。以下是我们使用的内容:

3. Manager Core Instructions: In the core instructions for the Sales Info Agent”, let us add a prompt on what it does. Here is what we used:

图像

// 开始提示

// Begin Prompt

您是一名信息专员,负责通过查找 LinkedIn 个人资料来提供人员和公司信息。您的目标是分析电子邮件的主题和正文,判断是查找个人还是公司信息,然后提供相关信息。请读取并存储发件人的电子邮件地址,以便我们返回所需信息。

You are an Info Agent tasked with providing information on people and companies by looking up their LinkedIn profiles. Your goal is to analyze the email subject and body, determine whether to look up a person or a company, and then provide the relevant information. Read and store the sender’s email, as we need to return the information.

首先,根据邮件主题判断你需要查找的是个人信息还是公司信息。如果主题包含“人”或类似词语,则查找的是个人信息;如果主题包含“公司信息”或类似短语,则查找的是公司信息。

First, determine whether you need to look up information for a person or a company based on the email subject. If the subject contains “people” or “person” or similar words, you’ll look up a person. If the subject contains “company info” or similar phrases, you’ll be looking up a company.

如果你正在查找某人的信息:

If you’re looking up a person:

1. 致电个人信息代理机构,获取详细信息。

1. Call the Person info agent and get the details.

2. 收集相关信息,例如他们的当前职位、公司、所在地以及职业经历概述。格式如下。

2. Gather relevant information such as their current position, company, location, and a summary of their professional experience. Format as given below

3. 使用“向客户发送最终回复邮件”工具将此信息发送到之前存储的发件人邮箱。

3. Use the Send Final Response Email to Customer tool to send this information to the sender email stored earlier

如果您正在查找一家公司:

If you’re looking up a company:

1. 致电公司信息代理并获取详细信息。

1. Call the Company info agent and get the details.

2. 按以下格式收集有关公司的相关信息。

2. Gather relevant information on the company in the format given below

3. 使用“向客户发送最终回复邮件”工具,回复同一封邮件,并附上公司信息。

3. Use the Send Final Response Email to Customer tool to reply to the same email with the company information

请按以下格式作答:

Provide your response in the following format:

<lookup_type>个人/公司</lookup_type>

<lookup_type>Person/Company</lookup_type>

<姓名或公司名称>个人或公司名称</姓名或公司名称>

<name_or_company>Name of person or company</name_or_company>

<信息>

<information>

请以清晰简洁的方式在此处提供收集到的信息。

[Provide the gathered information here in a clear, concise manner]

</information>

</information>

请记住,仅使用您能通过 LinkedIn 工具找到的信息。如果您找不到所需信息,请说明该信息不可用。

Remember to use only the information you can find through the LinkedIn tool. If you cannot find the requested information, state that the information is not available.

// 结束提示

// End Prompt

如您所见,这类似于子代理提示。我们告诉管理器,它需要查找电子邮件主题以调用相应的子代理。然后,在针对个人或公司的步骤中,调用相应的子代理。最后,格式化提取的信息并将其发送给发件人。

As you can see, it is like the sub-agent prompts. We are telling the manager that it needs to look up the email subject to invoke the appropriate sub-agent. Then, within the steps for a person or company, invoke the respective sub-agent. Finally, format and send the info extracted to the sender.

就是这样!我们现在将测试多智能体系统。

That’s it! We will test the multi-agent system now.

4.5 运行和测试

4.5. Run and Test

正如我们在流程中提到的,我们会向指定的Gmail邮箱发送邮件以获取所需信息。我们只需在邮件主题中注明所需信息的类型,例如“人员”。请在邮件正文中输入“info”,然后具体说明我们需要获取信息的个人或公司。以下是步骤:

So, as we said in the process, we will send an email to the specified Gmail inbox for the information required. We just need to specify what type of info we need in the subject, e.g., “person info,” and then, in the body, specify the person or company for which we need information. Here are the steps:

1. 发送测试邮件: 发送一封测试邮件,主题为“人员信息”,邮件正文中写明您需要获取信息的人员姓名。

1. Send Test Email: Send a test email with subject lines saying «person info» and the name of the person you need information for in the email body.

图像

2. 监控代理:此邮件将触发管理代理开始处理请求。请前往控制面板并点击“销售信息代理”。您将看到代理的运行详情,如下所示:

2. Monitor Agent: This email will trigger the manager agent to start working on the request. Go to the dashboard and click on “sales info agent.” You will find the run details for the agent, as presented below:

图像

如您所见,代理人阅读了电子邮件,了解需要提供个人信息,并将此任务委托给了“个人信息”代理人。

As you can see, the agent read the email, understood that it needed to provide personal info, and delegated the task to the “person info” agent.

3. 子代理委托:如果您点击“查看对话”按钮,您会看到个人信息代理使用了 Google 搜索工具和 LinkedIn 个人资料提取工具来提取个人资料信息。

3. Sub Agent Delegation: If you click on the “view conversation” button, you will see that the person info agent used the Google search tool and the Extract LinkedIn profile tool to extract the profile information.

图像

4. 在销售代理信息运行详情(上述步骤 2)中,您会看到系统最终向发件人发送了一封包含信息的电子邮件。以下是我收到的回复邮件。

4. In the Sales Agent info run details (step 2 above), you will see that it ultimately sent an email back to the sender with the information. Here is the email reply I got.

图像

因此,经理代理人和次级代理人利用这些工具进行协作,并提供了我们所需的个人信息。

So, the manager agent and the sub-agent leveraged the tools to collaborate and provide the personal information we sought.

结论:你的第一个人工智能代理

Conclusion: Your First AI Agent

恭喜!您刚刚构建了您的第一个人工智能代理系统——一个能够彻底改变销售团队获取和利用关键信息方式的实用解决方案。通过完成本分步指南,您不仅构建了一个强大的人工智能销售信息代理,还开启了掌握构建多代理系统艺术的旅程。

Congratulations! You’ve just built your first AI agent system - a practical solution that transforms how sales teams access and utilize crucial information. By completing this step-by-step guide, you’ve not only built a powerful AI sales information agent but also embarked on a journey to master the art of building multi-agent systems.

通过这份实战指南,我们详细介绍了构建代理的四个基本步骤:

Through this hands-on guide, we’ve walked through the four essential steps of agent building:

1. 我们发现了一个宝贵的机会,人工智能代理可以在销售运营中发挥真正的作用。

1. We identified a valuable opportunity where AI agents could make a real difference in sales operations

2. 我们设计了一个周全的多智能体系统,其中各个专业智能体无缝协作以完成任务。

2. We designed a thoughtful multi-agent system where specialized agents collaborate seamlessly to achieve the tasks

3. 我们制定并实施了清晰的流程,以协调代理人之间的沟通、任务分配和成果交付。

3. We mapped out and implemented clear processes that orchestrate how the agents communicate, delegate tasks, and deliver results

4. 我们使用 Relevance AI 的无代码平台实现了该解决方案。

4. We implemented the solution using Relevance AI’s no-code platform

但这仅仅是个开始。我们在这里介绍的原则和方法可以应用于无数其他商业场景。

But this is just the beginning. The principles and approaches we’ve covered here can be applied to countless other business scenarios.

请记住,构建高效的AI代理是一个迭代过程。随着代理的部署和使用,您会发现增强其能力、改进其响应和扩展其功能的新方法。不要害怕尝试和改进——每一次迭代都会让您更接近最优解决方案。

Remember that building effective AI agents is an iterative process. As you deploy and use your agent, you’ll discover new ways to enhance its capabilities, improve its responses, and expand its functionality. Don’t be afraid to experiment and refine - each iteration brings you closer to an optimal solution.

我们鼓励您运用这些理念和工具来应对您面临的独特挑战。

We encourage you to take these concepts and tools and apply them to your unique challenges.

第十二章 – 用例:企业级人工智能代理应用程序

CHAPTER 12 – Use Cases: Enterprise AI Agent Application

本附录介绍了在关键行业中成功实施的 15 个 AI 代理应用案例。所有案例均属于第三级(代理工作流)部署,其中多个 AI 代理协同工作,在人工监督下执行复杂的业务流程。启动代理式 AI 转型的关键成功因素在于识别、评估并优先考虑合适的业务用例。为了加速这一进程,我们根据在各行业的丰富实施经验,精心挑选了这些案例。每个案例研究都深入分析了业务挑战、代理功能和可衡量的结果,为您的代理式 AI 项目提供了切实可行的蓝图。

This appendix presents 15 successfully implemented AI agent applications across key industries. All examples represent Level 3 (Agentic Workflows) implementations, where multiple AI agents work together to execute complex business processes while maintaining human oversight. A critical success factor in launching an agentic AI transformation is identifying, assessing, and prioritizing the right business use cases. To accelerate this process, we’ve curated these examples from our extensive implementation experience across industries. Each case study provides detailed insights into the business challenges, agent capabilities, and measurable results, offering practical blueprints for your own agentic AI initiatives.

1. 运营与供应链

1. OPERATIONS & SUPPLY CHAIN

供应商沟通

Supplier Communications

业务挑战:一家大型航空公司服务机构面临日益复杂的运营挑战,每天需要处理数千项供应商沟通和文档处理任务。虽然早期的聊天机器人应用有助于处理基本查询,但未能带来该机构所需的效率提升。该公司意识到,仅仅依靠自动化是不够的——真正的运营改进需要……能够独立协调任务、执行工作流程并适应动态业务条件的智能系统。

Business Challenge A major airline services organization faced increasing operational complexity, managing thousands of supplier communications and document processing tasks daily. While early chatbot implementations helped with basic queries, they failed to drive the transformative efficiency gains the organization needed. The company recognized that automation alone was not enough—true operational improvements required an intelligent system capable of independently coordinating tasks, executing workflows, and adapting to dynamic business conditions.

代理能力

Agent Capabilities

利用自然语言处理技术解读供应商发送的消息

Interpret incoming supplier messages using natural language processing

根据紧急程度、合同条款和运营影响来安排沟通路线

Route communications based on urgency, contract terms, and operational impact

生成符合供应商协议的上下文感知响应

Generate context-aware responses aligned with supplier agreements

执行后续行动,包括审批和系统更新

Execute follow-up actions including approvals and system updates

学习并改进基于交互模式的决策能力

Learn and refine decision-making based on interaction patterns

监控并标记需要人工干预的异常情况

Monitor and flag anomalies requiring human intervention

遵守航空业标准

Maintain compliance with aviation industry standards

影响与成果:该系统采用借鉴自航空运营的“例外管理”模式——日常通信完全自动化,人工操作员仅在出现异常或高风险情况时才介入。这种方法的早期成效显著。处理时间大幅缩短,同时,通过系统对业务规则的持续应用,准确性也得到了提升。更重要的是,人工智能重新定义了员工的角色——将员工从重复性的处理任务中解放出来,使他们能够专注于更高价值的问题解决。公司认为,这种转变对于应对劳动力挑战至关重要,尤其是在面临劳动力短缺的地区。

Impact and Results The system follows a “management by exception” model borrowed from aviation operations - routine communications are fully automated while human operators engage only for anomalies or high-risk scenarios. The early impact of this approach has been significant. Processing times have decreased substantially, while accuracy has improved through the system’s consistent application of business rules. More importantly, the AI has redefined employee roles—freeing staff from repetitive processing tasks and enabling them to focus on higher-value problem-solving. The company sees this shift as critical for addressing workforce challenges, particularly in regions facing labor shortages.

制造运营协调

Manufacturing Operations Coordination

业务挑战:一家全球制造商在协调跨多个地点的复杂工厂运营方面面临挑战。生产线、供应商和维护计划。传统的自动化解决方案无法应对制造运营的动态特性,其中一个环节的变化会在整个系统中产生连锁反应。

Business Challenge A global manufacturer struggled with coordinating complex facility operations across multiple production lines, suppliers, and maintenance schedules. Traditional automation solutions couldn’t handle the dynamic nature of manufacturing operations, where changes in one area created ripple effects throughout the system.

代理能力

Agent Capabilities

监控实时生产指标和设备状态

Monitor real-time production metrics and equipment status

协调多条生产线的排产计划

Coordinate scheduling across multiple production lines

管理库存水平和供应商关系

Manage inventory levels and supplier relationships

根据生产需求优化维护时间

Optimize maintenance timing based on production demands

根据生产变化调整人员配备需求

Adjust staffing requirements based on production changes

为利益相关者生成自动化报告和警报

Generate automated reports and alerts for stakeholders

预测并预防潜在瓶颈

Predict and prevent potential bottlenecks

影响与成果:在一次重大供应链中断事件中,该智能体系统展现了其价值。该系统自主重新计算生产计划,识别替代供应商,调整人员配置需求,并修改维护计划,从而优化可用资源。这种动态响应能力帮助企业在面临严峻供应链挑战的情况下,仍维持了92%的计划产量。该系统将计划外停机时间减少了35%,并将整体设备效率提高了25%。

Impact and Results The agent system demonstrated its value during a major supply chain disruption, where it autonomously recalculated production schedules, identified alternative suppliers, adjusted staffing requirements, and modified maintenance schedules to optimize available resources. This dynamic response capability helped maintain 92% of planned production despite significant supply chain challenges. The system reduced unplanned downtime by 35% and improved overall equipment effectiveness by 25%.

供应链风险管理

Supply Chain Risk Management

业务挑战:一家全球消费品公司在管理涉及多层供应商、地域和产品线的供应链风险方面面临着日益复杂的挑战。传统的监控系统无法有效预测和应对复杂的风险情景,也无法协调整个组织的应对措施。

Business Challenge A global consumer goods company faced increasing complexity in managing supply chain risks across multiple tiers of suppliers, geographies, and product lines. Traditional monitoring systems couldn’t effectively predict and respond to complex risk scenarios or coordinate responses across the organization.

代理能力

Agent Capabilities

实时监测全球供应链事件和中断情况

Monitor global supply chain events and disruptions in real-time

评估中断对多个供应链层级的影响

Assess the impact of disruptions on multiple supply chain tiers

确定替代采购方案并计算成本

Identify alternative sourcing options and calculate costs

协调多个部门的应急预案

Coordinate response plans across multiple departments

制定风险缓解建议

Generate risk mitigation recommendations

跟踪供应商绩效和合规情况

Track supplier performance and compliance

与利益相关者保持持续沟通

Maintain continuous communication with stakeholders

影响与成果:在一次重大的全球供应链中断事件中,该系统自动识别受影响的供应商,计算各产品线受到的影响,并协调替代采购策略。代理网络将供应链中断的响应时间缩短了 60%,并将供应链风险事件减少了 40%。最重要的是,它实现了主动风险管理,在影响运营之前就解决了 85% 的潜在中断问题。

Impact and Results During a major global supply chain disruption, the system automatically identified affected suppliers, calculated impact across product lines, and coordinated alternative sourcing strategies. The agent network reduced response time to supply chain disruptions by 60% and decreased supply chain risk incidents by 40%. Most importantly, it enabled proactive risk management, with 85% of potential disruptions addressed before impacting operations.

2. 销售与收入管理

2. SALES & REVENUE MANAGEMENT

复杂的B2B销售流程编排

Complex B2B Sales Orchestration

业务挑战:一家科技公司在管理涉及多方利益相关者、审批流程冗长且解决方案配置复杂的企业销售周期时,面临着日益复杂的挑战。传统的客户关系管理 (CRM) 系统无法有效协调复杂 B2B 销售流程中众多的接触点和依赖关系。

Business Challenge A technology company faced increasing complexity in managing enterprise sales cycles involving multiple stakeholders, lengthy approval processes, and complex solution configurations. Traditional CRM systems couldn’t effectively coordinate the numerous touchpoints and dependencies in complex B2B sales processes.

代理能力

Agent Capabilities

分析历史交易模式,找出成功因素

Analyze historical deal patterns to identify success factors

协调各销售团队的后续行动

Coordinate follow-up activities across sales teams

生成个性化提案文件

Generate personalized proposal documents

追踪竞争情报和市场动态

Track competitive intelligence and market dynamics

管理管道和预测更新

Manage pipeline and forecast updates

优化区域和客户分配

Optimize territory and account assignments

自动生成日常销售文档

Automate routine sales documentation

影响与成果:该代理系统通过识别成功交易的模式并自动调整互动策略,彻底改变了销售流程。它将销售团队在行政任务上花费的时间减少了 40%,同时将成交率提高了 28%。该系统协调复杂利益相关者沟通和自动生成提案的能力,将销售周期缩短了 40%,使销售团队能够专注于建立关系和进行战略讨论。

Impact and Results The agent system transformed the sales process by identifying patterns in successful deals and automatically adjusting engagement strategies. It reduced the time sales teams spent on administrative tasks by 40% while increasing win rates by 28%. The system’s ability to coordinate complex stakeholder communications and automate proposal generation reduced sales cycle time by 40%, allowing sales teams to focus on relationship building and strategic discussions.

客户增长与留存管理

Account Growth & Retention Management

业务挑战:一家软件即服务公司难以主动识别其客户群中的增长机会和流失风险。传统的客户管理方式严重依赖人工监控和单个客户经理的见解,这使得有效扩展规模和维持稳定的服务水平变得困难。

Business Challenge A software-as-a-service company struggled to proactively identify growth opportunities and churn risks across their customer base. The traditional account management approach relied heavily on manual monitoring and individual account manager insights, making it difficult to scale effectively and maintain consistent service levels.

代理能力

Agent Capabilities

监控客户使用模式和参与度指标

Monitor customer usage patterns and engagement metrics

根据使用趋势确定扩展机会

Identify expansion opportunities based on usage trends

及早发现潜在客户流失的预警信号

Detect early warning signs of potential churn

协调主动宣传和参与活动

Coordinate proactive outreach and engagement activities

生成个性化成长建议

Generate personalized growth recommendations

自动化日常账户管理任务

Automate routine account management tasks

跟踪客户健康评分和成功指标

Track customer health scores and success metrics

影响与成果:该代理系统通过提供早期洞察和协调主动干预措施,彻底革新了客户账户管理。它平均比以往方法提前 60 天识别出高风险账户,并将成功挽留率提高了 45%。该系统还推动了业务增长,通过更及时、更相关的追加销售机会,使拓展收入增长了 35%。由于更积极主动和个性化的互动,客户满意度评分提高了 25%。

Impact and Results The agent system revolutionized account management by providing early insights and coordinating proactive interventions. It identified at-risk accounts an average of 60 days earlier than previous methods and increased successful retention interventions by 45%. The system also drove growth, with a 35% increase in expansion revenue through better-timed and more relevant upsell opportunities. Customer satisfaction scores improved by 25% due to more proactive and personalized engagement.

3. 客户体验与服务

3. CUSTOMER EXPERIENCE & SERVICE

医疗保健访问导航

Healthcare Access Navigation

业务挑战:一家大型医疗保健系统发现,服务不足的人群难以获得现有的医疗保健服务和援助项目。传统的流程​​要求患者在多个复杂的系统中辗转,填写大量申请表,并与多个机构协调——这造成了就医的巨大障碍。许多符合条件的患者仅仅因为申请流程的复杂性而错失了关键服务。

Business Challenge A major healthcare system identified that underserved populations were struggling to access available healthcare services and assistance programs. The traditional process required patients to navigate multiple complex systems, fill out numerous applications, and coordinate across various agencies - creating significant barriers to care. Many eligible patients were missing out on critical services simply due to the complexity of the application processes.

代理能力

Agent Capabilities

通过对话式访谈了解患者情况

Conduct conversational interviews to understand patient situations

自主从授权来源收集文档

Autonomously gather documentation from authorized sources

根据患者的具体情况,确定合适的项目和服务。

Identify suitable programs and services based on patient circumstances

完成并提交多个援助项目的申请

Complete and submit applications across multiple assistance programs

监控申请状态并回复信息请求

Monitor application statuses and respond to information requests

协调交通等实际支持服务

Coordinate practical support services like transportation

通过患者首选的渠道与患者保持清晰的沟通。

Maintain clear communication with patients through preferred channels

影响与成果:人工智能代理系统通过扮演智能导航员和倡导者的角色,彻底改变了人们获得医疗服务的途径。当患者提及失业时,系统会自动评估其是否符合多个援助项目的资格,启动申请流程,并协调各项支持服务——所有这些都通过患者首选的沟通渠道及时告知。处理时间从数周缩短至数天,而项目参与率则显著提高。

Impact and Results The AI agent system transformed access to care by acting as an intelligent navigator and advocate. When a patient mentions losing their job, the system automatically evaluates eligibility across multiple assistance programs, initiates applications, and coordinates support services - all while keeping the patient informed through their preferred communication channel. Processing times decreased from weeks to days, while program enrollment rates increased significantly.

银行服务协调

Banking Service Coordination

业务挑战:一家大型零售银行在协调跨多个渠道和产品线的复杂客户服务请求方面面临挑战。传统的银行系统各自独立运作,难以提供无缝服务,尤其是在涉及多个部门或产品的请求时。

Business Challenge A major retail bank struggled with coordinating complex customer service requests across multiple channels and product lines. Traditional banking systems operated in silos, making it difficult to provide seamless service, especially for requests involving multiple departments or products.

代理能力

Agent Capabilities

处理和路由跨渠道的客户咨询

Process and route customer inquiries across channels

协调多个部门的应对措施

Coordinate responses across multiple departments

处理复杂的交易调查

Handle complex transaction investigations

管理欺诈警报和安全措施

Manage fraud alerts and security measures

自动执行例行服务请求

Automate routine service requests

遵守银行监管规定

Maintain compliance with banking regulations

生成个性化的客户沟通信息

Generate personalized customer communications

影响与成果:代理系统通过协调以往各自为政的部门,显著提升了服务效率。抵押贷款处理时间从 45 天缩短至 18 天,可疑活动响应时间缩短了 80%,客户满意度提高了 35%。该系统能够处理复杂的多部门请求。在提高准确性和合规性的同时,将解决时间缩短了 60%。

Impact and Results The agent system dramatically improved service delivery by coordinating across previously siloed departments. Mortgage processing time was reduced from 45 days to 18 days, suspicious activity response time cut by 80%, and customer satisfaction scores increased by 35%. The system’s ability to handle complex multi-department requests reduced resolution times by 60% while improving accuracy and compliance.

保险理赔处理

Insurance Claims Processing

业务挑战:一家大型保险公司在处理涉及多方参与、服务提供商和文件要求的理赔案件时,面临着日益复杂的挑战。传统的理赔处理系统无法有效协调各方利益相关者,也无法适应理赔过程中不断变化的情况。

Business Challenge A large insurance provider faced increasing complexity in managing claims involving multiple parties, service providers, and documentation requirements. Traditional claims processing systems couldn’t effectively coordinate the various stakeholders or adapt to changing circumstances during claims resolution.

代理能力

Agent Capabilities

利用计算机视觉分析索赔文件和照片

Analyze claims documentation and photos using computer vision

交叉参考政策详情和承保范围限制

Cross-reference policy details and coverage limitations

与多个服务提供商协调

Coordinate with multiple service providers

管理理赔员和索赔人之间的沟通

Manage communication between adjusters and claimants

通过网络分析识别潜在的欺诈模式

Identify potential fraud patterns through network analysis

基于多种因素优化结算时间

Optimize settlement timing based on multiple factors

在整个过程中保持合规性

Maintain regulatory compliance throughout the process

影响与成果:代理系统展现了其价值,尤其是在复杂的多方理赔案件中。在一个案例中,该系统协调了五家保险公司、三家维修店和多家医疗机构之间的沟通,确保了清晰的文档记录,并将理赔处理速度提高了 60%。整体理赔处理时间缩短了 40%,准确率提高了 35%,客户满意度提高了 30%。该系统的欺诈检测能力使可疑理赔模式的识别率提高了 25%。

Impact and Results The agent system demonstrated its value, particularly in complex multi-party claims. In one case, it coordinated communications between five insurance companies, three repair shops, and multiple medical providers, maintaining clear documentation and speeding resolution by 60%. Overall claims processing time reduced by 40%, accuracy improved by 35%, and customer satisfaction scores increased by 30%. The system’s fraud detection capabilities led to a 25% increase in identifying suspicious claims patterns.

4. 风险、合规与安全

4. RISK, COMPLIANCE & SECURITY

金融欺诈检测

Financial Fraud Detection

业务挑战:一家全球性金融机构难以检测日益复杂的跨渠道和跨交易类型的欺诈模式。传统的基于规则的欺诈检测系统过于僵化,无法适应不断演变的欺诈手段,且误报率高,浪费了调查人员大量时间。

Business Challenge A global financial institution struggled with detecting increasingly sophisticated fraud patterns across multiple channels and transaction types. Traditional rule-based fraud detection systems were too rigid to adapt to evolving fraud schemes and generated high rates of false positives that consumed investigator time.

代理能力

Agent Capabilities

实时监控所有渠道的交易模式

Monitor transaction patterns across all channels in real-time

关联多个账户和系统中的数据点

Correlate data points across multiple accounts and systems

利用网络分析识别复杂的欺诈模式

Identify complex fraud patterns using network analysis

协调各部门的紧急应对行动

Coordinate immediate response actions across departments

针对复杂案件生成调查资料包

Generate investigation packages for complex cases

根据新模式更新欺诈检测规则

Update fraud detection rules based on new patterns

保留所有检测和响应措施的审计跟踪记录

Maintain audit trails of all detection and response actions

影响与成果:该代理系统通过识别传统系统无法识别的细微模式,彻底革新了欺诈检测方式。在一次事件中,该系统通过识别多个账户和渠道上的细微模式,成功检测出一起协同欺诈企图,并在交易完成前阻止了潜在损失。该系统将误报率降低了 60%,欺诈检测率提高了 35%,并将响应时间从数小时缩短至数分钟。年度欺诈损失减少了 45%,而调查人员的工作效率提高了 40%。

Impact and Results The agent system revolutionized fraud detection by identifying subtle patterns that traditional systems missed. During one incident, it detected a coordinated fraud attempt by recognizing subtle patterns across multiple accounts and channels, preventing potential losses before any transactions were completed. The system reduced false positives by 60%, increased fraud detection rates by 35%, and cut response times from hours to minutes. Annual fraud losses decreased by 45% while investigator efficiency improved by 40%.

监管文件

Regulatory Documentation

业务挑战:一家全球生命科学公司在管理研发和生产过程中复杂的文档方面面临着日益严峻的挑战。该行业高度监管的特性要求其采用一套能够理解GxP指南、生产偏差和合规性细节的复杂系统。传统的文档管理系统无法处理所需的深度法规解读。

Business Challenge A global life sciences company faced mounting challenges in managing complex documentation across research, development, and manufacturing. The highly regulated nature of the industry required a sophisticated system capable of understanding GxP guidelines, manufacturing deviations, and compliance intricacies. Traditional document management systems couldn’t handle the depth of regulatory interpretation needed.

代理能力

Agent Capabilities

解读GxP要求并确定相关政策

Interpret GxP requirements and identify relevant policies

搜索内部数据库和外部监管资源

Search internal databases and external regulatory sources

评估生产偏差并交叉比对以往事件

Assess manufacturing deviations and cross-reference past incidents

编制结构化、符合审计要求的报告

Compile structured, audit-ready reports

保持完整的文档可追溯性

Maintain full documentation traceability

协调各部门的合规工作流程

Coordinate compliance workflows across departments

提醒利益相关者注意潜在的合规问题

Alert stakeholders of potential compliance issues

影响与成果:该系统通过提供智能解读和协调功能,彻底改变了监管文件的管理方式。例如,如果质量保证专家调查一起生产偏差,系统不仅提供原始数据,还能自主识别类似偏差的相关历史案例,交叉引用可能影响解决方案的监管更新,生成一份合规报告,概述风险、建议措施和支持性文件,并在发现模式表明存在需要流程调整的系统性问题时,向关键利益相关者发出警报。初步结果表明,监管文件的管理方式发生了根本性的转变。

Impact and Results The system transformed regulatory documentation management by providing intelligent interpretation and coordination. For example, if a quality assurance specialist investigates a manufacturing deviation, the system doesn’t just provide raw data. It autonomously identifies relevant historical cases from similar deviations, cross-references regulatory updates that may impact resolution protocols, synthesizes a compliance report outlining risks, recommended actions, and supporting documentation, and alerts key stakeholders if patterns indicate systemic issues requiring process adjustments. Early results have demonstrated a fundamental shift in how regulatory documentation is managed.

IT 安全运营

IT Security Operations

业务挑战:一家科技公司在协调其复杂的IT基础设施中的安全响应方面面临挑战。传统的安全工具会生成大量警报,但无法有效地对威胁进行优先级排序,也无法协调跨多个系统的全面响应。

Business Challenge A technology company struggled with coordinating security responses across their complex IT infrastructure. Traditional security tools generated numerous alerts but couldn’t effectively prioritize threats or coordinate comprehensive responses across multiple systems.

代理能力

Agent Capabilities

监控所有平台上的系统活动

Monitor system activities across all platforms

关联安全事件以识别威胁模式

Correlate security events to identify threat patterns

协调立即响应行动

Coordinate immediate response actions

更新跨系统的安全规则

Update security rules across systems

启动漏洞评估

Initiate vulnerability assessments

生成安全事件报告

Generate security incident reports

遵守安全标准

Maintain compliance with security standards

影响与成果:该代理系统在一次潜在的零日攻击事件中展现了其价值,成功识别并遏制了威胁,使其未能影响关键系统。总体而言,事件响应时间缩短了 60%,误报率降低了 75%,系统可用性提高了 45%。该系统能够从过往事件中学习并自动更新安全规则,从而使攻击成功率降低了 40%。

Impact and Results The agent system demonstrated its value during a potential zero-day threat incident, where it identified and contained the threat before it could impact critical systems. Overall, incident response times reduced by 60%, false positive alerts decreased by 75%, and system availability improved by 45%. The system’s ability to learn from past incidents and automatically update security rules led to a 40% reduction in successful breach attempts.

5. 知识工作与分析

5. KNOWLEDGE WORK & ANALYTICS

竞争情报

Competitive Intelligence

业务挑战:一家金融服务机构在监控竞争对手的财务业绩、将市场数据与内部基准进行核对以及为高管决策提供及时洞察方面面临着日益严峻的挑战。传统流程依赖于从盈利报告、行业出版物和财务文件中手动收集数据。随后进行了大量的核对工作,以使外部数据与内部预测模型保持一致。这种方法耗时费力,结果不一致,而且在提供实时信息方面能力有限。

Business Challenge A financial services organization faced increasing challenges in monitoring competitor financial performance, reconciling market data with internal benchmarks, and generating timely insights for executive decision-making. Traditional processes relied on manual data collection from earnings reports, industry publications, and financial filings, followed by extensive reconciliation efforts to align external figures with internal forecasting models. This approach was time-consuming, inconsistent, and limited in its ability to provide real-time intelligence.

代理能力

Agent Capabilities

持续抓取和处理财务报告和收益报表

Continuously scrape and process financial reports and earnings statements

将外部数据与内部模型进行交叉比对

Cross-reference external figures against internal models

对不同报告方法的数据进行标准化

Normalize data across different reporting methodologies

生成结构化的竞争分析报告

Generate structured competitive analysis reports

识别新兴市场趋势和风险

Identify emerging market trends and risks

实时回复高管问询

Provide real-time responses to executive queries

维护历史分析和趋势数据

Maintain historical analysis and trend data

影响与成果:该智能体系统通过提供实时、情境化的洞察,彻底改变了竞争情报的收集方式。与传统自动化不同,该系统不仅汇总财务数据,还能主动解读信息并将其置于特定情境中,在标准报告周期出现异常情况、战略转变和潜在风险之前,就将其识别出来。领导团队可以通过对话式界面与系统互动,无需人工干预即可获取最新的财务对比、竞争定位分析或基于情景的预测。该系统能够随着时间的推移自主优化输出结果,这不仅显著减轻了分析师的负担,还提高了战略洞察的准确性和速度。

Impact and Results The agent system transformed competitive intelligence gathering by providing real-time, contextual insights. Unlike traditional automation, this system does more than aggregate financial data—it actively interprets and contextualizes information, identifying anomalies, strategic shifts, and potential risks before they become evident in standard reporting cycles. Leadership teams can interact with the system through a conversational interface, requesting up-to-date financial comparisons, competitive positioning analyses, or scenario-based forecasts without relying on manual intervention. The system’s ability to autonomously refine its outputs over time has significantly reduced the burden on analysts while improving the accuracy and speed of strategic insights.

市场研究与综合

Market Research & Synthesis

业务挑战:一家咨询公司在收集、分析和整合跨多个行业和数据源的市场调研数据方面面临着越来越大的困难。传统研究这些方法耗时费力,而且常常忽略了不同市场细分和趋势之间的重要联系。

Business Challenge A consulting firm faced increasing difficulty in gathering, analyzing, and synthesizing market research across multiple industries and data sources. Traditional research methods were time-consuming and often missed important connections across different market segments and trends.

代理能力

Agent Capabilities

从多个公共和专有来源收集数据

Gather data from multiple public and proprietary sources

分析各行业的市场趋势

Analyze market trends across industries

识别新兴机遇和威胁

Identify emerging opportunities and threats

生成全面的市场报告

Generate comprehensive market reports

维护最新的行业知识库

Maintain up-to-date industry knowledge bases

协调各团队的研究工作流程

Coordinate research workflows across teams

创建定制化研究简报

Create customized research briefings

影响与成果:该代理系统彻底革新了公司的研究能力,在更短的时间内提供了更深入、更具关联性的洞察。以往需要数月才能完成的研究,现在只需几周即可完成,且洞察深度提升了40%。该系统识别跨行业模式的能力,为客户带来了多项突破性洞察,最终使咨询业务量增长了35%,客户满意度提升了45%。

Impact and Results The agent system revolutionized the firm’s research capabilities by providing deeper, more connected insights in less time. Research that previously took months is now completed in weeks, with a 40% increase in the depth of insights generated. The system’s ability to identify cross-industry patterns has led to several breakthrough client insights, resulting in a 35% increase in consulting engagements and a 45% improvement in client satisfaction scores.

6. 员工及行政服务

6. EMPLOYEE & ADMINISTRATIVE SERVICES

人力资源运营

HR Operations

业务挑战:一家全球性公司在协调跨多个地区、时区和监管环境的复杂人力资源流程方面面临挑战。传统的人力资源系统各自独立运行,难以提供一致的员工体验,也难以确保跨司法管辖区的合规性。

Business Challenge A global corporation struggled with coordinating complex HR processes across multiple regions, time zones, and regulatory environments. Traditional HR systems operated in silos, making it difficult to provide consistent employee experiences and maintain compliance across jurisdictions.

代理能力

Agent Capabilities

协调端到端的招聘流程

Coordinate end-to-end hiring processes

分析市场薪资数据和技能要求

Analyze market salary data and skill requirements

根据以往的成功经验,撰写有效的职位描述

Craft effective job descriptions based on past success

使用复杂模式匹配的屏幕应用程序

Screen applications using sophisticated pattern matching

管理跨时区的复杂面试安排

Manage complex interview scheduling across time zones

精心策划个性化的新员工入职流程

Orchestrate personalized onboarding journeys

监测员工队伍模式,以发现员工流失风险

Monitor workforce patterns for retention risks

识别新出现的技能差距和发展需求

Identify emerging skill gaps and development needs

确保跨多个司法管辖区合规

Ensure compliance across multiple jurisdictions

影响与成果:该代理系统通过协调以往分散的流程,彻底改变了人力资源运营模式。招聘流程速度提升了 45%,候选人质量提高了 30%,早期离职率降低了 25%。该系统能够预测员工流失风险并识别技能发展需求,从而使员工留任率提高了 40%。人力资源团队表示,他们现在能够将 60% 的时间用于战略举措,而不是行政事务。

Impact and Results The agent system transformed HR operations by coordinating previously fragmented processes. Hiring processes accelerated by 45%, candidate quality improved by 30%, and early-stage turnover reduced by 25%. The system’s ability to predict retention risks and identify skill development needs led to a 40% improvement in employee retention. HR teams report spending 60% more time on strategic initiatives rather than administrative tasks.

IT服务管理

IT Service Management

业务挑战:一家跨国公司在管理其全球基础设施中的IT服务请求方面面临挑战。传统的IT服务管理工具无法有效地对请求进行优先级排序、协调不同技术团队的响应,也无法在不同地区维持一致的服务水平。

Business Challenge A multinational company struggled with managing IT service requests across their global infrastructure. Traditional IT service management tools couldn’t effectively prioritize requests, coordinate responses across technical teams, or maintain consistent service levels across regions.

代理能力

Agent Capabilities

根据上下文分析和路由服务请求

Analyze and route service requests based on context

协调多个技术团队的响应

Coordinate responses across multiple technical teams

监控系统性能指标

Monitor system performance metrics

协调软件部署和更新

Orchestrate software deployments and updates

管理访问请求和安全协议

Manage access requests and security protocols

生成绩效和合规性报告

Generate performance and compliance reports

从以往事件中吸取教训,以缩短响应时间

Learn from past incidents to improve response times

维护各区域的服务水平协议

Maintain service level agreements across regions

影响和结果 该代理系统通过智能协调和主动问题解决,彻底革新了IT服务管理。在最近一次云基础设施故障中,该系统协调了数据库故障转移、网络重路由和应用程序扩展,并实时向相关人员通报情况。总体而言,事件解决时间缩短了60%,日常工单量减少了40%,系统可用性提高了45%。更重要的是,IT团队从被动应对故障转变为主动提升系统性能,从而使重大事件减少了35%。

Impact and Results The agent system revolutionized IT service management through intelligent coordination and proactive problem-solving. During a recent cloud infrastructure incident, the system coordinated database failover, network rerouting, and application scaling while keeping stakeholders informed in real-time. Overall incident resolution times reduced by 60%, routine ticket volume decreased by 40%, and system availability improved by 45%. Most significantly, IT teams shifted from reactive firefighting to proactive system enhancement, leading to a 35% reduction in major incidents.

***

***

这些企业应用案例展现了智能体人工智能在不同业务职能领域的变革潜力。尽管每个案例的具体实施情况各不相同,但仍存在一些共同模式:显著提升效率、提高准确性,以及或许最为重要的,将人类工作提升到更具战略意义的层面。我们鼓励各组织以这些案例为起点,开展自身的转型。智能体人工智能的实施机会存在于所有职能部门和行业——关键在于从定义明确的用例入手,这些用例能够清晰地展现其价值,同时增强组织的各项能力和信心。

These enterprise applications demonstrate the transformative potential of agentic AI across diverse business functions. While each implementation is unique, common patterns emerge: significant efficiency gains, improved accuracy, and perhaps most importantly, the elevation of human work to more strategic activities. We encourage organizations to use these cases as starting points for their own transformations. The opportunities for implementing agentic AI exist across all functions and industries - the key is starting with well-defined use cases that can demonstrate clear value while building organizational capabilities and confidence.

第十二章 – 应用案例:个人生产力人工智能代理应用

CHAPTER 12 – Use Cases: Personal Productivity AI Agent Applications

企业应用展现了对组织的深远影响,而个人效率应用则往往能带来人工智能代理最直接、最切实的好处。以下五个案例展示了人工智能代理如何改变个人的工作模式,为希望推动人工智能在更广泛的领域发挥作用的组织提供了切实可行的切入点。采纳。每项应用都代表了经过验证的实施方案,这些方案既能保持人的自主性,又能显著提高生产力。

While enterprise applications demonstrate organizational impact, personal productivity applications often provide the most immediate and tangible benefits of agentic AI. These five implementations showcase how AI agents can transform individual work patterns, providing practical starting points for organizations looking to build momentum for broader adoption. Each application represents proven implementations that maintain human agency while significantly enhancing productivity.

1. 电子邮件管理与沟通

1. Email Management & Communication

商业挑战:专业人士每周要花费 15-20 小时管理电子邮件通信,难以应对收件箱过载、回复优先级排序以及与不同利益相关者保持一致的沟通质量。

Business Challenge Professionals were spending 15-20 hours weekly managing email communications, struggling with inbox overload, response prioritization, and maintaining consistent communication quality across different stakeholders.

代理能力

Agent Capabilities

分析收到的电子邮件,判断其紧急程度和重要性

Analyze incoming emails for urgency and importance

根据以往沟通情况,拟定回复稿。

Draft contextual responses based on previous communications

确定行动事项和后续要求

Identify action items and follow-up requirements

协调多个线程的响应

Coordinate responses across multiple threads

生成一致的沟通模板

Generate consistent communication templates

与不同利益相关者保持恰当的语气

Maintain appropriate tone across different stakeholders

安排后续跟进并跟踪待回复

Schedule follow-ups and track pending responses

影响与成果:一位市场总监表示,使用该系统后每周节省了 15 小时,并指出:“这就像拥有一个私人助理,他完全了解我的想法和工作方式。”该系统将电子邮件回复时间缩短了 60%,同时提高了回复的质量和一致性。使用该系统的团队表示,他们可用于战略性工作的时间增加了 40%。

Impact and Results A marketing director reported saving 15 hours weekly using the system, noting, “It’s like having a personal assistant who knows exactly how I think and work.” The system reduced email response time by 60% while improving response quality and consistency. Teams using the system reported 40% more time available for strategic work.

2. 日历和会议优化

2. Calendar & Meeting Optimization

商业挑战:专业人士在管理复杂的日程安排、保障专注工作时间和确保会议时间高效利用方面面临诸多挑战。传统的日历工具无法有效平衡各种优先事项,也无法维护工作与生活的界限。

Business Challenge Professionals struggled with managing complex scheduling demands, protecting focused work time, and ensuring productive use of meeting hours. Traditional calendar tools couldn’t effectively balance competing priorities or maintain work-life boundaries.

代理能力

Agent Capabilities

理解并保护工作重点

Understand and protect work priorities

协调跨时区的日程安排

Coordinate scheduling across time zones

保护指定的深度工作时段

Protect designated deep work sessions

优化会议分配

Optimize meeting distributions

生成会议准备材料和会议纪要

Generate meeting preparations and summaries

跟踪后续事项和承诺

Track follow-up items and commitments

保持工作与生活平衡的界限

Maintain work-life balance boundaries

影响与成果:该系统将行政日程安排时间减少了 70%,同时提高了会议效率。一位高管表示:“这就像拥有一个战略助理,他不仅了解我的日程安排,还了解我的工作重点和工作方式。”使用该系统的团队表示,他们有 35% 的时间可以专注于工作,会议负担也减轻了 40%。

Impact and Results The system reduced administrative scheduling time by 70% while improving meeting effectiveness. A senior executive reported that “it’s like having a strategic assistant who understands not just my schedule, but my priorities and work style.” Teams using the system reported 35% more time for focused work and a 40% reduction in meeting overload.

3. 研究与信息综合

3. Research & Information Synthesis

业务挑战:知识工作者花费大量时间从多个来源收集、分析和整合信息。传统的调研工具无法有效地将不同领域的洞见联系起来,也无法生成针对特定需求的结构化输出。

Business Challenge Knowledge workers spent excessive time gathering, analyzing, and synthesizing information from multiple sources. Traditional research tools couldn’t effectively connect insights across different domains or generate structured outputs tailored to specific needs.

代理能力

Agent Capabilities

从多个授权来源收集信息

Gather information from multiple authorized sources

分析并交叉引用数据点

Analyze and cross-reference data points

生成结构化研究摘要

Generate structured research summaries

识别关键趋势和模式

Identify key trends and patterns

创建自定义报告格式

Create customized report formats

维护源文档

Maintain source documentation

跟踪研究进展和最新动态

Track research progress and updates

建议提供其他相关资源

Suggest relevant additional sources

影响与成果:该智能体系统通过自动化信息收集和整合,显著提升了研究效率。一位顾问指出,这项功能“极大地增强了我利用公司集体智慧的能力,使每个项目都从更坚实的基础出发。”研究时间缩短了60%,而洞察的深度和质量则提高了40%。

Impact and Results The agent system transformed research efficiency by automating information gathering and synthesis. A consultant noted that this capability “transformed my ability to leverage our firm’s collective knowledge, making each project start from a much stronger foundation.” Research time reduced by 60% while the depth and quality of insights improved by 40%.

4. 任务与项目协调

4. Task & Project Coordination

业务挑战:专业人员在管理多个项目、协调依赖关系以及保持各个工作流程的透明度方面面临诸多挑战。传统的项目管理工具无法有效适应不断变化的优先级,也无法协调不同工具和团队之间的工作。

Business Challenge Professionals struggled with managing multiple projects, coordinating dependencies, and maintaining visibility across various workstreams. Traditional project management tools couldn’t effectively adapt to changing priorities or coordinate across different tools and teams.

代理能力

Agent Capabilities

将复杂项目分解为可管理的任务

Break down complex projects into manageable tasks

协调多个工作流程之间的依赖关系

Coordinate dependencies across multiple workstreams

监控截止日期和进度

Monitor deadlines and progress

根据优先级变化调整日程安排

Adjust schedules based on priority changes

生成状态更新和报告

Generate status updates and reports

找出潜在瓶颈

Identify potential bottlenecks

维护项目文档

Maintain project documentation

跟踪资源分配

Track resource allocation

影响与成果:该系统通过智能协调显著提升了项目管理效率。一位产品经理强调:“它追踪了数百个我原本会忽略的细节,让我能够专注于战略决策。”使用该系统的团队项目完成率提高了 40%,战略思考时间也增加了 35%。

Impact and Results The system transformed project management effectiveness through intelligent coordination. One product manager highlighted how “it kept track of hundreds of small details I would have missed, allowing me to focus on strategic decisions.” Teams using the system showed 40% higher project completion rates and reported 35% more time for strategic thinking.

5. 文档创建与审核

5. Document Creation & Review

业务挑战专业人员花费大量时间创建、审查和修改文档,经常难以保持一致性、确保准确性以及管理跨多个利益相关者的版本控制。

Business Challenge Professionals spent significant time creating, reviewing, and revising documents, often struggling with maintaining consistency, ensuring accuracy, and managing version control across multiple stakeholders.

代理能力

Agent Capabilities

生成初始文档草稿

Generate initial document drafts

审查一致性和准确性

Review for consistency and accuracy

跟踪更改和版本

Track changes and versions

协调审查工作流程

Coordinate review workflows

遵守风格指南

Maintain style guidelines

查阅引文和参考文献

Check citations and references

生成执行摘要

Generate executive summaries

文档格式需符合标准

Format documents to standards

影响与成果:代理系统彻底革新了文档管理工作流程。文档创建时间缩短了 50%,审核周期缩短了 40%。团队成员反馈文档质量和一致性均有所提升,一位经理表示:“这就像拥有了一位精通我们风格指南的专属编辑。” 更重要的是,专业人员表示,他们现在有更多时间专注于高价值内容创作,而不是格式化和行政工作。

Impact and Results The agent system revolutionized document management workflows. Document creation time reduced by 50%, while review cycles shortened by 40%. Teams reported improved document quality and consistency, with one manager noting, “It’s like having a dedicated editor who knows our style guide perfectly.” Most importantly, professionals reported having more time for high-value content creation rather than formatting and administrative tasks.

这些个人效率应用展示了人工智能代理如何改变个人的工作模式,从而腾出时间从事更具战略性和创造性的工作,同时提高产出质量和一致性。所有应用的关键影响不仅在于效率的提升,更在于从根本上改变了专业人士将时间和精力集中在真正创造价值的活动上的方式。

These personal productivity applications demonstrate how AI agents can transform individual work patterns, freeing up time for more strategic and creative tasks while improving output quality and consistency. The key impact across all applications has been not just efficiency gains, but a fundamental shift in how professionals can focus their time and energy on truly value-adding activities.

***

***

这些个人效率应用为企业开启智能体人工智能之旅提供了便捷的切入点。通过从这些个人层面的部署入手,企业可以逐步熟悉并信任人工智能代理,同时为员工创造即时价值。结果始终表明,这不仅能提升效率,更能从根本上改变专业人士的时间和精力分配方式,让他们能够专注于真正创造价值的活动。我们鼓励企业将这些应用视为迈向更广泛的智能体人工智能转型的第一步,并利用已取得的成功经验,为更全面的部署奠定基础。

These personal productivity applications offer accessible entry points for organizations beginning their agentic AI journey. By starting with these individual-level implementations, organizations can build familiarity and confidence with AI agents while delivering immediate value to employees. The results consistently show not just efficiency gains, but a fundamental shift in how professionals can focus their time and energy on truly value-adding activities. We encourage organizations to consider these applications as initial steps toward broader agentic AI transformation, using the demonstrated successes to build momentum for more comprehensive implementations.

指数

Index

一个

A

代理即 服务329、330

Agent-as-a-Service 329, 330

代理 功能373、530、531、532、533、534、535、536、537、538、539、540、541、542、544、545、546、547

Agent Capabilities 373, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 544, 545, 546, 547

代理 机会256、258、262、318

Agentic Opportunities 256, 258, 262, 318

代理间经济 336

Agent-to-Agent Economy 336

人工智能问责制 307

AI Accountability 307

AI 代理演进框架 501

AI Agent Progression Framework 501

人工智能伦理 397 , 398

AI Ethics 397, 398

AI治理 309 , 482

AI Governance 309, 482

注意力机制 209

Attention Mechanism 209

自主代理 71 , 299

Autonomous Agents 71, 299

C

C

变更 管理357、371、384

Change Management 357, 371, 384

断路器 304

Circuit breaker 304

认知多样性 188

Cognitive Diversity 188

集体智慧 187 , 481

Collective Intelligence 187, 481

计算机应用 AI 115

Computer Use AI 115

上下文窗口 207、208

Context Window 207, 208

D

D

数字工作者 47 , 84

Digital Worker 47, 84

E

E

情景 记忆219、221、223、227

Episodic Memory 219, 221, 223, 227

错误处理 136、303、507

Error Handling 136, 303, 507

F

F

备用版本 147、148、150、151、314、315

fallback 147, 148, 150, 151, 314, 315

G

G

生成式人工智能 8、44、45、311、415、417、418、419、420、499

Generative AI 8, 44, 45, 311, 415, 417, 418, 419, 420, 499

H

H

人机协作 282

Human-AI Collaboration 282

I

智能 自动化40、73、271、273、274、410、493、494、500

Intelligent Automation 40, 73, 271, 273, 274, 410, 493, 494, 500

L

L

大型语言模型(LLM) 169

Large Language Models (LLMs) 169

长期 记忆204、216、217、218、219、221、249

Long-Term Memory 204, 216, 217, 218, 219, 221, 249

M

M

记忆巩固 225

Memory Consolidation 225

元学习 161 , 241

Meta-Learning 161, 241

多智能 体系统90、96、99、133、185、186、190、191

Multi-Agent System 90, 96, 99, 133, 185, 186, 190, 191

P

P

程序性 记忆220、221、223

Procedural Memory 220, 221, 223

R

推理 能力185、186

Reasoning Capabilities 185, 186

推理模型 169

Reasoning Models 169

S

S

扩展人工智能代理 403

Scaling AI Agents 403

语义 记忆220、221、223

Semantic Memory 220, 221, 223

短期 记忆204、206、212、214、249

Short-Term Memory 204, 206, 212, 214, 249

SPAR 框架62、64

SPAR Framework 62, 64

T

T

三个 关键124、127、128

Three Keystones 124, 127, 128

工具弹性 框架148、150

Tool Resilience Framework 148, 150

培训 169、170、333、366、367

Training 169, 170, 333, 366, 367

人工智能透明度 309

Transparency in AI 309

U

U

全民基本收入 469

Universal Basic Income 469

1 Pascal Bornet、Ian Barkin 和 Jochen Wirtz,2020 年。“智能自动化:学习如何利用人工智能提升业务并使世界更人性化”。https ://www.amazon.com/INTELLIGENT-AUTOMATION-Artificial-Intelligence-business/dp/B08KTDVHHQ

1 Pascal Bornet, Ian Barkin, and Jochen Wirtz, 2020. “INTELLIGENT AUTOMATION: Learn how to harness Artificial Intelligence to boost business & make our world more human”. https://www.amazon.com/INTELLIGENT-AUTOMATION-Artificial-Intelligence-business/dp/B08KTDVHHQ

2 Asana,2025。“为什么谈论工作不好”,Asana, https://asana.com/resources/why-work-about-work-is-bad

2 Asana, 2025. “Why Work About Work Is Bad,” Asana, https://asana.com/resources/why-work-about-work-is-bad

3 剑桥国际考试委员会。“第四章:创新与创造力”,剑桥国际考试委员会, https://www.cambridgeinternational.org/Images/426483-chapter-4-innovation-and-creativity.pdf

3 Cambridge International. “Chapter 4: Innovation and Creativity,” Cambridge International, https://www.cambridgeinternational.org/Images/426483-chapter-4-innovation-and-creativity.pdf

4 Aamer Baig 等人,2024 年。“告别人工智能世代的蜜月期:首席信息官从试点到规模化的七个残酷真相”,麦肯锡公司。https ://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale

4 Aamer Baig, et al., 2024. “Moving Past Gen AI’s Honeymoon Phase: Seven Hard Truths for CIOs to Get From Pilot to Scale,” McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale

5 Asana,2023。“工作剖析全球指数”,Asana, https://asana.com/resources/anatomy-of-work

5 Asana, 2023. “Anatomy of Work Global Index,” Asana, https://asana.com/resources/anatomy-of-work

6 Matt Gonzales,2024 年。“职场倦怠的严重程度如何?”,SHRM, https://www.shrm.org/topics-tools/news/inclusion-diversity/burnout-shrm-research-2024

6 Matt Gonzales, 2024. “Here’s How Bad Burnout Has Become at Work,” SHRM, https://www.shrm.org/topics-tools/news/inclusion-diversity/burnout-shrm-research-2024

7 Fabrizio Dell'Acqua 等人,2023 年。“驾驭崎岖的技术前沿:人工智能对知识工作者生产力和质量影响的实地实验证据”,SSRN, https://papers.ssrn.com/sol3/papers.cfm? abstract_id=4573321

7 Fabrizio Dell’Acqua, et al., 2023. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality,” SSRN, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

8基于我们对 167 家已实施基于 LLM 代理的公司的研究。有关此项研究的详细信息,请参阅本书第 1 章。

8 Based on our own research across 167 companies that have implemented LLM-based agents. Refer to the detail of this research presented in Chapter 1 of this book.

9 Jared Spataro,2024 年。“全新自主代理以前所未有的方式扩展您的团队”,微软博客, https://blogs.microsoft.com/blog/2024/10/21/new-autonomous-agents-scale-your-team-like-never-before/

9 Jared Spataro, 2024. “New Autonomous Agents Scale Your Team Like Never Before,” Microsoft Blog, https://blogs.microsoft.com/blog/2024/10/21/new-autonomous-agents-scale-your-team-like-never-before/

10 Ari Lehavi 等人,2024 年。“数字化同事的崛起”,穆迪, https://www.moodys.com/web/en/us/insights/resources/the-rise-of-the-digital-colleague.pdf

10 Ari Lehavi, et al., 2024. “The Rise of the Digital Colleague,” Moody’s, https://www.moodys.com/web/en/us/insights/resources/the-rise-of-the-digital-colleague.pdf

11 Thomas H. Davenport 和 Peter High,2024 年。“分析型人工智能和基因型人工智能有何不同——以及何时使用哪一种”,《哈佛商业评论》, https://hbr.org/2024/12/how-gen-ai-and-analytical-ai-differ-and-when-to-use-each

11 Thomas H. Davenport and Peter High, 2024. “How Analytical AI and Gen AI Differ—and When to Use Each,” Harvard Business Review, https://hbr.org/2024/12/how-gen-ai-and-analytical-ai-differ-and-when-to-use-each

12 Ehud Karpas 等人,2022 年。“MRKL 系统:一种模块化的神经符号架构,它结合了大型语言模型、外部知识源和离散推理”, https://arxiv.org/abs/2205.00445

12 Ehud Karpas et al., 2022. “MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning,” https://arxiv.org/abs/2205.00445

13 Shunyu Yao 等人,2022 年。“ReAct:在语言模型中协同推理和行动”, https://arxiv.org/abs/2210.03629v3

13 Shunyu Yao et al., 2022. “ReAct: Synergizing Reasoning and Acting in Language Models,” https://arxiv.org/abs/2210.03629v3

14 Shunyu Yao 等人,2022 年。“ReAct:在语言模型中协同推理和行动”, https://arxiv.org/abs/2210.03629v3

14 Shunyu Yao et al., 2022. “ReAct: Synergizing Reasoning and Acting in Language Models,” https://arxiv.org/abs/2210.03629v3

15 Timo Schick 等人,2023 年。“Toolformer:语言模型可以自学使用工具”, http://arxiv.org/abs/2302.04761

15 Timo Schick et al., 2023. “Toolformer: Language Models Can Teach Themselves to Use Tools,” http://arxiv.org/abs/2302.04761

16 位维基百科贡献者,2025 年。“AutoGPT”, http://en.wikipedia.org/wiki/AutoGPT

16 Wikipedia contributors, 2025. “AutoGPT,” http://en.wikipedia.org/wiki/AutoGPT

17 Yohei Nakajima,2024 年。“BabyAGI 的影响”,yoheinakajima.com, http://yoheinakajima.com/impact-of-babyagi/

17 Yohei Nakajima, 2024. “Impact of BabyAGI,” yoheinakajima.com, http://yoheinakajima.com/impact-of-babyagi/

18 LangChain,2025。“简介” http://python.langchain.com/docs/introduction/

18 LangChain, 2025. «Introduction,» http://python.langchain.com/docs/introduction/

19 Microsoft,2024 年。“semantic-kernel”,GitHub, http: //github.com/microsoft/semantic-kernel

19 Microsoft, 2024. “semantic-kernel,” GitHub, http://github.com/microsoft/semantic-kernel

20 OpenAI,2024。“函数调用”,OpenAI平台文档, http://platform.openai.com/docs/guides/function-calling

20 OpenAI, 2024. “Function calling,” OpenAI Platform Documentation, http://platform.openai.com/docs/guides/function-calling

21 Shishir G. Patil 等人,2023 年。“Gorilla:与海量 API 连接的大型语言模型”, http://arxiv.org/abs/2305.15334

21 Shishir G. Patil et al., 2023. “Gorilla: Large Language Model Connected with Massive APIs,” http://arxiv.org/abs/2305.15334

22 Microsoft,2024 年。“AutoGen”,微软研究院, https://www.microsoft.com/en-us/research/project/autogen/

22 Microsoft, 2024. “AutoGen,” Microsoft Research, https://www.microsoft.com/en-us/research/project/autogen/

23 Joon Sung Park 等人,2023 年。“生成式代理:人类行为的交互式模拟”,https://arxiv.org/abs/2304.03442

23 Joon Sung Park et al., 2023. “Generative Agents: Interactive Simulacra of Human Behavior,” https://arxiv.org/abs/2304.03442

24 Deheng Ye 等人,2024 年。“More Agents Is All You Need,” https://arxiv.org/abs/2402.05120

24 Deheng Ye et al., 2024. “More Agents Is All You Need,” https://arxiv.org/abs/2402.05120

25 Microsoft,2024 年。“AutoGen”,GitHub, https: //microsoft.github.io/autogen/stable/

25 Microsoft, 2024. “AutoGen,” GitHub, https://microsoft.github.io/autogen/stable/

26 The Batch,2024。“关于谷歌的 Vertex AI Agent Builder 的一切”,deeplearning.ai, https://www.deeplearning.ai/the-batch/all-about-googles-vertex-ai-agent-builder/

26 The Batch, 2024. “All About Google’s Vertex AI Agent Builder,” deeplearning.ai, https://www.deeplearning.ai/the-batch/all-about-googles-vertex-ai-agent-builder/

27 CrewAI,2024。“CrewAI推出多智能体平台,兑现其为企业提供生成式人工智能的承诺”,GlobeNewswire, https://www.globenewswire.com/news-release/2024/10/22/2966872/0/en/CrewAI-Launches-Multi-Agentic-Platform-to-Deliver-on-the-Promise-of-Generative-AI-for-Enterprise.html

27 CrewAI, 2024. “CrewAI Launches Multi-Agentic Platform to Deliver on the Promise of Generative AI for Enterprise,” GlobeNewswire, https://www.globenewswire.com/news-release/2024/10/22/2966872/0/en/CrewAI-Launches-Multi-Agentic-Platform-to-Deliver-on-the-Promise-of-Generative-AI-for-Enterprise.html

28 Statista,2025。“2024 年全球智能人工智能 (AI) 市场价值及 2030 年预测(单位:十亿美元)”,Statista, https://www.statista.com/statistics/1552183/global-agentic-ai-market-value/

28 Statista, 2025. “Market value of agentic artificial intelligence (AI) worldwide 2024 with a forecast for 2030 (in billion U.S. dollars),” Statista, https://www.statista.com/statistics/1552183/global-agentic-ai-market-value/

29 Tom Coshow,2024 年。“人工智能中的智能代理”,Gartner, https://www.gartner.com/en/articles/intelligent-agent-in-ai

29 Tom Coshow, 2024. “Intelligent Agent in AI,” Gartner, https://www.gartner.com/en/articles/intelligent-agent-in-ai

30 Sonya Huang 等人,2024 年。“生成式人工智能的 Act o1:推理时代的开始”,红杉资本, https://www.sequoiacap.com/article/generative-ais-act-o1/

30 Sonya Huang, et al., 2024. “Generative AI’s Act o1: The Reasoning Era Begins,” Sequoia Capital, https://www.sequoiacap.com/article/generative-ais-act-o1/

31 Lindsey Wilkinson,2025 年。“尽管准备不足且存在安全隐患,企业仍关注智能人工智能”,CIO Dive, http://www.ciodive.com/news/enterprise-AI-agent-agentic-autonomous-strategy-challenges/738172/

31 Lindsey Wilkinson, 2025. “Enterprises eye agentic AI despite readiness gaps and security concerns,” CIO Dive, http://www.ciodive.com/news/enterprise-AI-agent-agentic-autonomous-strategy-challenges/738172/

32 Nicole Deslandes,2024 年。“2025 年展望:智能人工智能元年”,TechInformed, http://techinformed.com/2025-informed-the-year-of-agentic-ai/

32 Nicole Deslandes, 2024. “2025 Informed: The Year of Agentic AI,” TechInformed, http://techinformed.com/2025-informed-the-year-of-agentic-ai/

33 Beam AI,2025 年。“雇佣自学习 AI 代理来运行您的运营 - Beam 的代理 AI”,beam.ai, http://beam.ai/

33 Beam AI, 2025. “Hire Self-Learning AI Agents to Run Your Operations - Agentic AI by Beam,” beam.ai, http://beam.ai/

34 Relevance AI,2025 年。“构建能够交付人类质量工作的 AI 代理团队”,relevanceai.com, http://relevanceai.com/

34 Relevance AI, 2025. “Build teams of AI agents that deliver human-quality work,” relevanceai.com, http://relevanceai.com/

35 UiPath,2025。“使用 UiPath Agent Builder 构建通往智能体自动化的路径”,uipath.com, http://www.uipath.com/product/agent-builder

35 UiPath, 2025. “Build a path to agentic automation with UiPath Agent Builder,” uipath.com, http://www.uipath.com/product/agent-builder

36 Microsoft,2025 年。“Copilot Studio 代理构建器概述”,Microsoft Learn, http://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/copilot-studio-agent-builder

36 Microsoft, 2025. “Overview of Copilot Studio agent builder,” Microsoft Learn, http://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/copilot-studio-agent-builder

37 CrewAI,2025。http ://www.crew.ai

37 CrewAI, 2025. http://www.crew.ai

38 ServiceNow,2025 年。“虚拟代理”, http://www.servicenow.com/products/virtual-agent.html

38 ServiceNow, 2025. “Virtual Agent,” http://www.servicenow.com/products/virtual-agent.html

39 Langchain,2025。http ://www.langchain.com/

39 Langchain, 2025. http://www.langchain.com/

40 AutogenAI,2025。http ://autogenai.com

40 AutogenAI, 2025. http://autogenai.com

41 OpenAI,2025 年。“Operator 简介”, http://openai.com/index/introducing-operator/

41 OpenAI, 2025. “Introducing Operator,” http://openai.com/index/introducing-operator/

42 Anthropic,2025。“计算机使用”, http://docs.anthropic.com/en/docs/build-with-claude/computer-use

42 Anthropic, 2025. “Computer Use,” http://docs.anthropic.com/en/docs/build-with-claude/computer-use

43 Google DeepMind,2024 年。“Project Mariner”,Google DeepMind, http://deepmind.google/technologies/project-mariner/

43 Google DeepMind, 2024. “Project Mariner,” Google DeepMind, http://deepmind.google/technologies/project-mariner/

44 Salesforce,2025 年。“Agentforce”,Salesforce, http://www.salesforce.com/agentforce/

44 Salesforce, 2025. “Agentforce,” Salesforce, http://www.salesforce.com/agentforce/

45希波克拉底人工智能,2025。“希波克拉底人工智能”, http://www.hippocraticai.com/

45 Hippocratic AI, 2025. “Hippocratic AI,” http://www.hippocraticai.com/

46 Larry Dignan,2024 年。“智能体人工智能:2025 年值得关注的三大主题”,Constellation Research, http://www.constellationr.com/blog-news/insights/agentic-ai-three-themes-watch-2025

46 Larry Dignan, 2024. “Agentic AI: Three Themes to Watch in 2025,” Constellation Research, http://www.constellationr.com/blog-news/insights/agentic-ai-three-themes-watch-2025

47 Agent.ai,2025。http ://agent.ai/

47 Agent.ai, 2025. http://agent.ai/

48参见近期关于人工智能礼宾服务及其如何变革客户旅程的研究:Liu, SQ, Vakeel, KA, Smith, NA, Alavipour, RS, Wei, C.(V). 和 Wirtz, J., 2024. “人工智能礼宾服务在客户旅程中的作用:它是什么,它如何为客户创造价值?”,《服务管理杂志》。http ://doi.org/10.1108/JOSM-12-2023-0523

48 See recent research on AI Concierges and how they are expected to transform customer journeys: Liu, S.Q., Vakeel, K.A., Smith, N.A., Alavipour, R.S., Wei, C.(V). and Wirtz, J., 2024. “AI concierge in the customer journey: what is it and how can it add value to the customer?”, Journal of Service Management. http://doi.org/10.1108/JOSM-12-2023-0523

49 Hayden Field,2024。“继 ChatGPT 和聊天机器人崛起之后,投资者涌入 AI 代理领域”,CNBC, http://www.cnbc.com/2024/06/07/after-chatgpt-and-the-rise-of-chatbots-investors-pour-into-ai-agents.html

49 Hayden Field, 2024. “After ChatGPT and the rise of chatbots, investors pour into AI agents,” CNBC, http://www.cnbc.com/2024/06/07/after-chatgpt-and-the-rise-of-chatbots-investors-pour-into-ai-agents.html

50 LangChain,2024 年。“人工智能代理的现状”, http://www.langchain.com/stateofaiagents

50 LangChain, 2024. “The State of AI Agents,” http://www.langchain.com/stateofaiagents

51 LangChain,2024。“Perplexity:一款人工智能答案引擎,让您像专业人士一样处理复杂的查询搜索”, http://www.langchain.com/breakoutagents/perplexity

51 LangChain, 2024. “Perplexity: An AI answer engine that lets you handle complex query searches like a Pro,” http://www.langchain.com/breakoutagents/perplexity

52 LangChain,2024。“构建一个人工智能导览器,帮助用户浏览 Ramp 的金融操作平台”, http://www.langchain.com/breakoutagents/ramp

52 LangChain, 2024. “Building an AI tour guide that helps users navigate Ramp’s platform for financial operations,” http://www.langchain.com/breakoutagents/ramp

53 LangChain,2024。“Superhuman:借助人工智能驱动的电子邮件搜索助手,快速浏览您的收件箱和日历”, http://www.langchain.com/breakoutagents/superhuman

53 LangChain, 2024. “Superhuman: Navigate your inbox and calendar in a flash, with an AI-powered search assistant for emails,” http://www.langchain.com/breakoutagents/superhuman

54 LangChain,2024。“使用 Replit Agent 改变用户从零开始构建软件、编写代码和创建应用程序的方式”, http://www.langchain.com/breakoutagents/replit

54 LangChain, 2024. “Transforming how users build software from scratch, to code, to application with Replit Agent,” http://www.langchain.com/breakoutagents/replit

55开放获取:本书中的所有图表均根据知识共享署名 4.0 国际许可协议的条款分发,该协议允许在以下详细说明的条件下无限制地使用、分发和复制:http://creativecommons.org/licenses/by/4.0/。

55 Open Access: All figures in this book are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction under the conditions detailed here: http://creativecommons.org/licenses/by/4.0/.

56 Cade Metz,2024 年。“当自动驾驶汽车实际上无法自主驾驶时”,《纽约时报》,2024 年 9 月 11 日, http://www.nytimes.com/2024/09/11/insider/when-self-driving-cars-dont-actually-drive-themselves.html

56 Cade Metz, 2024. “When Self-Driving Cars Don’t Actually Drive Themselves,” The New York Times, September 11, 2024, http://www.nytimes.com/2024/09/11/insider/when-self-driving-cars-dont-actually-drive-themselves.html

57 位维基百科贡献者,2025 年。“特斯拉自动驾驶仪”, http://en.wikipedia.org/wiki/Tesla_Autopilot

57 Wikipedia contributors, 2025. “Tesla Autopilot,” http://en.wikipedia.org/wiki/Tesla_Autopilot

58 Jameson Dow,2024 年。“Waymo 明天将在洛杉矶启动完全自动驾驶服务;今年晚些时候将在奥斯汀推出”,Electrek, http://electrek.co/2024/03/13/waymo-starts-fully-autonomous-rides-in-la-tomorrow-austin-later-this-year/

58 Jameson Dow, 2024. “Waymo starts fully autonomous rides in LA tomorrow; Austin later this year,” Electrek, http://electrek.co/2024/03/13/waymo-starts-fully-autonomous-rides-in-la-tomorrow-austin-later-this-year/

59 Kyle Swanson 等人,2024 年。“虚拟实验室:人工智能代理设计新的 SARS-CoV-2

59 Kyle Swanson et al., 2024. “ The Virtual Lab: AI Agents Design New SARS-CoV-2

“具有实验验证的纳米抗体”,bioRxiv,2024年11月12日,http://www.biorxiv.org/content/10.1101/2024.11.11.623004v1.full.pdf

Nanobodies with Experimental Validation,” bioRxiv, November 12, 2024, http://www.biorxiv.org/content/10.1101/2024.11.11.623004v1.full.pdf

60 Mehmet Uzgoren 等人,2024 年。“人工智能增强分布式系统及其对软件工程的影响研究”,第 13 届伦敦国际会议,2024 年 7 月 24-26 日, http://londonic.uk/js/index.php/plic/article/download/240/261/820

60 Mehmet Uzgoren et al., 2024. “Examination of AI Enhanced Distributed Systems and its Effects on Software Engineering,” 13th London International Conference, July 24-26, 2024, http://londonic.uk/js/index.php/plic/article/download/240/261/820

61 Smythos,2024。“什么是多智能体人工智能系统?” http://smythos.com/ai-agents/multi-agent-systems/multi-agent-ai-systems/

61 Smythos, 2024. “What Are Multi-agent AI Systems?” http://smythos.com/ai-agents/multi-agent-systems/multi-agent-ai-systems/

62 Dave Andre,2025 年。“什么是合约网协议?” All About AI, http://www.allaboutai.com/ai-glossary/contract-net-protocol/

62 Dave Andre, 2025. “What is Contract Net Protocol?” All About AI, http://www.allaboutai.com/ai-glossary/contract-net-protocol/

63 D. Jarne Ornia,2023 年。“高效协作控制:多智能体系统中的通信、学习和鲁棒性”,代尔夫特理工大学,2023 年 4 月 24 日, http://research.tudelft.nl/en/publications/efficient-control-for-cooperation-communication-learning-and-robu

63 D. Jarne Ornia, 2023. “Efficient Control for Cooperation: Communication, Learning and Robustness in Multi-Agent Systems,” Delft University of Technology, April 24, 2023, http://research.tudelft.nl/en/publications/efficient-control-for-cooperation-communication-learning-and-robu

64 Hanmo Chen 等人,2023 年。“大规模智能体合作与竞争中涌现的集体智能”, http://arxiv.org/abs/2301.01609

64 Hanmo Chen et al., 2023. “Emergent collective intelligence from massive-agent cooperation and competition,” http://arxiv.org/abs/2301.01609

65 Meta Fundamental AI Research Diplomacy Team (FAIR) 等,2022 年。“通过结合语言模型和战略推理,在外交游戏中达到人类水平的博弈”,《科学》,2022 年 12 月 9 日, http://noambrown.github.io/papers/22-Science-Diplomacy-TR.pdf

65 Meta Fundamental AI Research Diplomacy Team (FAIR) et al., 2022. “Human-level play in the game of Diplomacy by combining language models with strategic reasoning,” Science, December 9, 2022, http://noambrown.github.io/papers/22-Science-Diplomacy-TR.pdf

66 Raphael Shu 等人,2025 年。“利用 Amazon Bedrock 上的多智能体协作解锁复杂问题解决”,AWS 机器学习博客, http://aws.amazon.com/blogs/machine-learning/unlocking-complex-problem-solving-with-multi-agent-collaboration-on-amazon-bedrock/

66 Raphael Shu et al., 2025. “Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock,” AWS Machine Learning Blog, http://aws.amazon.com/blogs/machine-learning/unlocking-complex-problem-solving-with-multi-agent-collaboration-on-amazon-bedrock/

67 Daniel Dominguez,2024 年。“LangChain 新报告揭示 AI 代理的日益普及”,InfoQ, http://www.infoq.com/news/2024/12/ai-agents-langchain/

67 Daniel Dominguez, 2024. “New LangChain Report Reveals Growing Adoption of AI Agents,” InfoQ, http://www.infoq.com/news/2024/12/ai-agents-langchain/

68 Pegasystems Inc.,2025。“研究称,尽管对信任和可靠性存在担忧,但员工仍接受智能体人工智能”,纳斯达克, http://www.nasdaq.com/press-release/workers-embrace-agentic-ai-despite-concerns-about-trust-and-reliability-says-research

68 Pegasystems Inc., 2025. “Workers Embrace Agentic AI Despite Concerns About Trust and Reliability, Says Research,” NASDAQ, http://www.nasdaq.com/press-release/workers-embrace-agentic-ai-despite-concerns-about-trust-and-reliability-says-research

69 Emily Frith 等人,2021 年。“智力和创造力具有共同的认知和神经基础”,《大脑皮层》,31(12),5523-5537, http://pubmed.ncbi.nlm.nih.gov/33119355/

69 Emily Frith et al., 2021. “Intelligence and creativity share a common cognitive and neural basis,” Cerebral Cortex, 31(12), 5523-5537, http://pubmed.ncbi.nlm.nih.gov/33119355/

70 Anthropic,2025。“计算机使用”, http://docs.anthropic.com/en/docs/build-with-claude/computer-use

70 Anthropic, 2025. “Computer Use,” http://docs.anthropic.com/en/docs/build-with-claude/computer-use

71 Google DeepMind,2024 年。“Project Mariner”, http://deepmind.google/technologies/project-mariner/

71 Google DeepMind, 2024. “Project Mariner,” http://deepmind.google/technologies/project-mariner/

72 OpenAI,2025 年。“Operator 简介”, http://openai.com/index/introducing-operator/

72 OpenAI, 2025. “Introducing Operator,” http://openai.com/index/introducing-operator/

73 Nick Bostrom,2005 年。“高级人工智能中的伦理问题”,牛津大学人类未来研究所, http://www.fhi.ox.ac.uk/wp-content/uploads/ethical-issues-in-advanced-ai.pdf

73 Nick Bostrom, 2005. “Ethical Issues in Advanced Artificial Intelligence,” Future of Humanity Institute, Oxford University, http://www.fhi.ox.ac.uk/wp-content/uploads/ethical-issues-in-advanced-ai.pdf

74 位维基百科贡献者,2024 年。“工具性趋同”, http://en.wikipedia.org/wiki/Instrumental_convergence

74 Wikipedia contributors, 2024. “Instrumental Convergence,” http://en.wikipedia.org/wiki/Instrumental_convergence

75 Decisionproblem,2025, http://www.decisionproblem.com/paperclips

75 Decisionproblem, 2025, http://www.decisionproblem.com/paperclips

76 Stephen M. Walker II,“HumanEval 基准”。http ://klu.ai/glossary/humaneval-benchmark

76 Stephen M. Walker II, “HumanEval Benchmark.” http://klu.ai/glossary/humaneval-benchmark

77 Yifan Mai 和 Percy Liang,2024 年。“基于 HELM 的大规模多任务语言理解 (MMLU)”。http ://crfm.stanford.edu/2024/05/01/helm-mmlu.html

77 Yifan Mai and Percy Liang, 2024. “Massive Multitask Language Understanding (MMLU) on HELM.” http://crfm.stanford.edu/2024/05/01/helm-mmlu.html

78 Xiao Liu 等人,2023 年。“AgentBench:评估 LLM 作为智能体。” http://arxiv.org/pdf/2308.03688v1

78 Xiao Liu, et al., 2023. “AgentBench: Evaluating LLMs as Agents.” http://arxiv.org/pdf/2308.03688v1

79.郑亮史等人,2024。“通过合作和交互式代理学习使用工具。” http://arxiv.org/abs/2403.03031

79 Zhengliang Shi, et al., 2024. “Learning to Use Tools via Cooperative and Interactive Agents.” http://arxiv.org/abs/2403.03031

80 Ao Li 等人,2024 年。“多智能体系统中的面向智能体的规划。” http://arxiv.org/abs/2410.02189

80 Ao Li, et al., 2024. “Agent-Oriented Planning in Multi-Agent Systems.” http://arxiv.org/abs/2410.02189

81 John-Anthony Disotto,2025 年。“OpenAI 的 Deep Research 打破了世界上最难的 AI 考试记录,ChatGPT o3-mini 和 DeepSeek 都被远远甩在身后。” http://www.techradar.com/computing/artificial-intelligence/openais-deep-research-smashes-records-for-the-worlds-hardest-ai-exam-with-chatgpt-o3-mini-and-deepseek-left-in-its-wake

81 John-Anthony Disotto, 2025. “OpenAI’s Deep Research smashes records for the world’s hardest AI exam, with ChatGPT o3-mini and DeepSeek left in its wake.” http://www.techradar.com/computing/artificial-intelligence/openais-deep-research-smashes-records-for-the-worlds-hardest-ai-exam-with-chatgpt-o3-mini-and-deepseek-left-in-its-wake

82 Tula Masterman 等人,2024 年。“用于推理、规划和工具调用的新兴人工智能代理架构概览:一项调查。” http://arxiv.org/abs/2404.11584

82 Tula Masterman, et al., 2024. “The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey.” http://arxiv.org/abs/2404.11584

83阮景清等人,2023。“TPTU:基于大型语言模型的用于任务规划和工具使用的AI代理。” http://arxiv.org/abs/2308.03427

83 Jingqing Ruan, et al., 2023. “TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage.” http://arxiv.org/abs/2308.03427

84 Atty Eleti、Jeff Harris、Logan Kilpatrick,2023。“函数调用和其他 API 更新。” http://openai.com/index/function-calling-and-other-api-updates/

84 Atty Eleti, Jeff Harris, Logan Kilpatrick, 2023. “Function calling and other API updates.” http://openai.com/index/function-calling-and-other-api-updates/

85 Timo Schick 等人,2023 年。“Toolformer:语言模型可以自学使用工具。” http://arxiv.org/abs/2302.04761

85 Timo Schick, et al., 2023. “Toolformer: Language Models Can Teach Themselves to Use Tools.” http://arxiv.org/abs/2302.04761

86 Jason Wei 等人,2022 年。“大型语言模型的涌现能力。” http://arxiv.org/abs/2206.07682

86 Jason Wei, et al., 2022. “Emergent Abilities of Large Language Models.” http://arxiv.org/abs/2206.07682

87 Andreas Tsamados 等人,2024 年。“人工智能系统的人类控制:从监督到协作。” http://link.springer.com/article/10.1007/s43681-024-00489-4

87 Andreas Tsamados, et al., 2024. “Human control of AI systems: from supervision to teaming.” http://link.springer.com/article/10.1007/s43681-024-00489-4

88 David De Cremer 等人,2021 年。“人工智能应该增强人类智能,而不是取代它。”《哈佛商业评论》。http ://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it

88 David De Cremer, et al., 2021. “AI Should Augment Human Intelligence, Not Replace It.” Harvard Business Review. http://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it

89 Jose N. Paredes 等人,2021 年。“论领域特定解释在基于人工智能的网络安全系统中的重要性”(技术报告)。http ://arxiv.org/abs/2108.02006

89 Jose N. Paredes, et al., 2021. “On the Importance of Domain-Specific Explanations in AI-based Cybersecurity Systems” (Technical Report). http://arxiv.org/abs/2108.02006

90 Cyril Amblard-Ladurantie,2024 年。“人工智能会取代网络安全专家吗?人类与人工智能之争。” MEGA。http: //www.mega.com/blog/will-ai-replace-cybersecurity-experts-human-vs-ai-debate

90 Cyril Amblard-Ladurantie, 2024. “Will AI Replace Cybersecurity Experts? The Human Vs. AI Debate.” MEGA. http://www.mega.com/blog/will-ai-replace-cybersecurity-experts-human-vs-ai-debate

91 Raihan Khan 等人,2024 年。“智能体人工智能系统中的安全威胁。” http://arxiv.org/abs/2410.14728

91 Raihan Khan, et al., 2024. “Security Threats in Agentic AI System.” http://arxiv.org/abs/2410.14728

92 Chelsea Finn 等人,2017 年。“用于深度网络快速适应的与模型无关的元学习。” http://arxiv.org/abs/1703.03400

92 Chelsea Finn, et al., 2017. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.” http://arxiv.org/abs/1703.03400

93 David Silver 等人,2017 年。“无需人类知识即可掌握围棋。”《自然》550:354–359。http ://doi.org/10.1038/nature24270

93 David Silver, et al., 2017. “Mastering the Game of Go without Human Knowledge.” Nature 550: 354–359. http://doi.org/10.1038/nature24270

94 Daniel Kahneman,2011 年。“思考,快与慢。”纽约:Farrar, Straus and Giroux。

94 Daniel Kahneman, 2011. “Thinking, Fast and Slow.” New York: Farrar, Straus and Giroux.

95 OpenAI,2024。“使用LLM学习推理。” http://openai.com/index/learning-to-reason-with-llms

95 OpenAI, 2024. “Learning to reason with LLMs.” http://openai.com/index/learning-to-reason-with-llms

96 OpenAI,2024。“使用LLM学习推理。” http://openai.com/index/learning-to-reason-with-llms/

96 OpenAI, 2024. “Learning to reason with LLMs.” http://openai.com/index/learning-to-reason-with-llms/

97 Emily M. Bender 等人,2021 年。“论随机鹦鹉的危险性:语言模型会不会过大?”美国计算机协会。http ://dl.acm.org/doi/10.1145/3442188.3445922

97 Emily M. Bender, et al., 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” Association for Computing Machinery. http://dl.acm.org/doi/10.1145/3442188.3445922

98 Tom B. Brown 等人,2020 年。“语言模型是少样本学习器。” http://arxiv.org/abs/2005.14165

98 Tom B. Brown, et al., 2020. “Language Models are Few-Shot Learners.” http://arxiv.org/abs/2005.14165

99 Yudi Pawitan 和 Chris Holmes,2024 年。“大型语言模型推理的置信度。” http://arxiv.org/abs/2412.15296

99 Yudi Pawitan and Chris Holmes, 2024. “Confidence in the Reasoning of Large Language Models.” http://arxiv.org/abs/2412.15296

100. Siyuan Wang 等人,2024 年。“符号工作记忆增强了复杂规则应用的语言模型。” http://arxiv.org/abs/2408.13654

100 Siyuan Wang, et al., 2024. “Symbolic Working Memory Enhances Language Models for Complex Rule Application.” http://arxiv.org/abs/2408.13654

101 Philipp Mondorf 和 Barbara Plank,2024 年。“超越准确率:评估大型语言模型的推理行为——一项调查。” http://arxiv.org/abs/2404.01869

101 Philipp Mondorf and Barbara Plank, 2024. “Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey.” http://arxiv.org/abs/2404.01869

102 Allen Newell 和 Herbert Alexander Simon,1972 年。“人类问题解决”。新泽西州恩格尔伍德克利夫斯:Prentice-Hall。

102 Allen Newell and Herbert Alexander Simon, 1972. “Human Problem Solving”. Englewood Cliffs, NJ: Prentice-Hall.

103 Pat Langley 等人,1987 年。“科学发现:创造过程的计算探索”。马萨诸塞州剑桥:麻省理工学院出版社。

103 Pat Langley, et al., 1987. “Scientific Discovery: Computational Explorations of the Creative Process”. Cambridge, MA: MIT Press.

104 Kent F. Hubert 等人,2024 年。“当前人工智能生成语言模型在发散思维任务上比人类更具创造力。”《自然·科学报告》14。http ://www.nature.com/articles/s41598-024-53303-w

104 Kent F. Hubert, et al., 2024. “The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks.” Nature Scientific Reports 14. http://www.nature.com/articles/s41598-024-53303-w

105 Patrick Haluptzok 等人,2023 年。“语言模型可以自我学习,从而更好地编程。” http://arxiv.org/abs/2207.14502

105 Patrick Haluptzok, et al., 2023. “Language Models Can Teach Themselves to Program Better.” http://arxiv.org/abs/2207.14502

106 Mahmood Hegazy,2025。“思维多样性在多智能体辩论框架中激发更强的推理能力。”《机器人与自动化研究杂志》,5(3)。http ://arxiv.org/abs/2410.12853

106 Mahmood Hegazy, 2025. “Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks.” Journal of Robotics and Automation Research, 5(3). http://arxiv.org/abs/2410.12853

107郝天王等人,2024。“学习打破:多智能体辩论系统中的知识增强推理。” http://arxiv.org/abs/2312.04854

107 Haotian Wang, et al., 2024. “Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System.” http://arxiv.org/abs/2312.04854

108 Yilun Du 等人,2023 年。“通过多智能体辩论改进语言模型的事实性和推理能力。” http://arxiv.org/abs/2305.14325

108 Yilun Du, et al., 2023. “Improving Factuality and Reasoning in Language Models through Multiagent Debate.” http://arxiv.org/abs/2305.14325

109 Mahmood Hegazy,2024。“思维多样性在多智能体辩论框架中激发更强的推理能力。” http://arxiv.org/abs/2410.12853

109 Mahmood Hegazy, 2024. “Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks.” http://arxiv.org/abs/2410.12853

110 Scott E. Page,2007 年。“差异:多样性的力量如何创造更好的群体、公司、学校和社会。”普林斯顿,新泽西州:普林斯顿大学出版社。

110 Scott E. Page, 2007. “The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies.” Princeton, NJ: Princeton University Press.

111 Perplexity AI. 2024. “什么是人工智能中的涌现行为?” http://www.perplexity.ai/page/what-is-emergent-behavior-in-a-cJ0gTqN7QX.wqxLltcqiWw

111 Perplexity AI. 2024. “What Is Emergent Behavior in AI?” http://www.perplexity.ai/page/what-is-emergent-behavior-in-a-cJ0gTqN7QX.wqxLltcqiWw

112 Swarnadeep Saha 等人,2023 年。“语言模型能否教会能力较弱的智能体?教师讲解通过个性化提升学生能力。” http://arxiv.org/abs/2306.09299

112 Swarnadeep Saha, et al., 2023. “Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization.” http://arxiv.org/abs/2306.09299

113 Adam Fourney 等人,2024 年。“Magentic-One:用于解决复杂任务的通用多智能体系统。” http://arxiv.org/abs/2411.04468

113 Adam Fourney, et al., 2024. “Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks.” http://arxiv.org/abs/2411.04468

114 Yilun Du 等人,2023 年。“通过多智能体辩论改进语言模型的事实性和推理能力。” http://arxiv.org/abs/2305.14325

114 Yilun Du, et al., 2023. “Improving Factuality and Reasoning in Language Models through Multiagent Debate.” http://arxiv.org/abs/2305.14325

115.李宏宇、刘一伦、闫俊,2025。“立场:涌现的机器智人促使人们重新思考多智能体范式。” http://arxiv.org/abs/2502.04388

115 Hongyu Li, Yilun Liu, and Jun Yan, 2025. “Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms.” http://arxiv.org/abs/2502.04388

116 Meir Kalech 和 Avraham Natan,2022 年。“基于模型的多智能体系统诊断:综述”。第三十六届 AAAI 人工智能大会论文集 (AAAI-22),12334-12341。http ://cdn.aaai.org/ojs/21498/21498-13-25511-1-2-20220628.pdf

116 Meir Kalech and Avraham Natan, 2022. “Model-Based Diagnosis of Multi-Agent Systems: A Survey.” Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), 12334-12341. http://cdn.aaai.org/ojs/21498/21498-13-25511-1-2-20220628.pdf

117 Ciaran Regan、Alexandre Gournail 和 Mizuki Oka,2024 年。“语言模型网络中的问题求解。” http://arxiv.org/abs/2406.12374

117 Ciaran Regan, Alexandre Gournail, and Mizuki Oka, 2024. “Problem-Solving in Language Model Networks.” http://arxiv.org/abs/2406.12374

118 CNS Nevada. 2024. “人脑的记忆容量是多少?” http://www.cnsnevada.com/what-is-the-memory-capacity-of-a-human-brain/

118 CNS Nevada. 2024. “What Is the Memory Capacity of a Human Brain?” http://www.cnsnevada.com/what-is-the-memory-capacity-of-a-human-brain/

119 Larry R. Squire 和 Stuart Zola-Morgan,1991 年。“内侧颞叶记忆系统”。《科学》253,第 5026 期:1380–1386。http ://doi.org/10.1126/science.1896849

119 Larry R. Squire and Stuart Zola-Morgan, 1991. “The Medial Temporal Lobe Memory System.” Science 253, no. 5026: 1380–1386. http://doi.org/10.1126/science.1896849

120 Eduardo Camina 和 Francisco Güell,2017 年。“记忆的神经解剖学、神经生理学和心理学基础:当前模型及其起源。”《药理学前沿》8:438。http ://doi.org/10.3389/fphar.2017.00438

120 Eduardo Camina and Francisco Güell, 2017. “The Neuroanatomical, Neurophysiological and Psychological Basis of Memory: Current Models and Their Origins.” Frontiers in Pharmacology 8:438. http://doi.org/10.3389/fphar.2017.00438

121 Jarrad AG Lum 和 Gina Conti-Ramsden,2013 年。“长期记忆:特定语言障碍中陈述性记忆和程序性记忆研究的综述和荟萃分析。” http://pmc.ncbi.nlm.nih.gov/articles/PMC3986888/

121 Jarrad A. G. Lum and Gina Conti-Ramsden, 2013. “Long-term memory: A review and meta-analysis of studies of declarative and procedural memory in specific language impairment.” http://pmc.ncbi.nlm.nih.gov/articles/PMC3986888/

122 Elizabeth F. Loftus,1975 年。“引导性问题与目击者报告。”认知心理学 7,第 4 期:560–572。http ://doi.org/10.1016/0010-0285(75)90023-7。

122 Elizabeth F. Loftus, 1975. “Leading Questions and the Eyewitness Report.” Cognitive Psychology 7, no. 4: 560–572. http://doi.org/10.1016/0010-0285(75)90023-7.

123. George A. Miller,1956。“神奇的数字七,加减二:我们信息处理能力的某些限制。”《心理学评论》63,第2期:81-97。http: //doi.org/10.1037/h0043158

123 George A. Miller, 1956. “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information.” Psychological Review 63, no. 2: 81–97. http://doi.org/10.1037/h0043158

124 Jiaming Tang 等人,2024。“Quest:面向查询的稀疏性在高效长上下文 LLM 推理中的应用。” ResearchGate。http: //www.researchgate.net/publication/381484873_Quest_Query-Aware_Sparsity_for_Efficient_Long-Context_LLM_Inference

124 Jiaming Tang, et al., 2024. “Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.” ResearchGate. http://www.researchgate.net/publication/381484873_Quest_Query-Aware_Sparsity_for_Efficient_Long-Context_LLM_Inference

125 Nelson F. Liu 等人,2023 年。“迷失在中间:语言模型如何使用上下文。” http://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf

125 Nelson F. Liu, et al., 2023. “Lost in the Middle: How Language Models Use Contexts.” http://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf

126 Ashish Vaswani 等人,2017 年。“注意力就是你所需要的一切。”神经信息处理系统进展 30。http ://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

126 Ashish Vaswani, et al., 2017. “Attention is All You Need.” Advances in Neural Information Processing Systems 30. http://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

127 Arash Mohtashami 和 Martin Jaggi,2023 年。“里程碑式的注意力:Transformer 的随机访问无限上下文长度。” http://arxiv.org/abs/2305.16300

127 Arash Mohtashami and Martin Jaggi, 2023. “Landmark Attention: Random-Access Infinite Context Length for Transformers.” http://arxiv.org/abs/2305.16300

128.张启明等人,2022。“VSA:在视觉Transformer中学习可变大小窗口注意力。” http://arxiv.org/abs/2204.08446

128 Qiming Zhang, et al., 2022. “VSA: Learning Varied-Size Window Attention in Vision Transformers.” http://arxiv.org/abs/2204.08446

129 Di Liu 等人,2024 年。“RetrievalAttention:通过向量检索加速长上下文 LLM 推理。” http://arxiv.org/abs/2409.10516

129 Di Liu, et al., 2024. «RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.» http://arxiv.org/abs/2409.10516

130 Hyungho Na 等人,2024 年。“合作多智能体强化学习中情景记忆的高效利用。” http://arxiv.org/abs/2403.01112

130 Hyungho Na, et al., 2024. “Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning.” http://arxiv.org/abs/2403.01112

131 Ali Behrouz 等人,2024 年。“Titans:考试时学习记忆。” https://arxiv.org/abs/2501.00663

131 Ali Behrouz, et al., 2024. “Titans: Learning to Memorize at Test Time.” https://arxiv.org/abs/2501.00663

132 Richard S. Sutton 和 Andrew G. Barto,2018 年。“强化学习:导论”。第 2 版。剑桥,马萨诸塞州:麻省理工学院出版社。

132 Richard S. Sutton and Andrew G. Barto, 2018. “Reinforcement Learning: An Introduction.” 2nd ed. Cambridge, MA: MIT Press

133 Timothy Hospedales 等人,2020 年。“神经网络中的元学习:综述。” https://arxiv.org/abs/2004.05439

133 Timothy Hospedales, et al., 2020. “Meta-Learning in Neural Networks: A Survey.” https://arxiv.org/abs/2004.05439 .

134 Álvaro G. Díaz 和 Hugues Bersini,2020 年。“密集神经网络架构的自优化:一种增量方法。”2020 年国际神经网络联合会议 (IJCNN)。https ://doi.org/10.1109/IJCNN48605.2020.9207416

134 Álvaro G. Díaz and Hugues Bersini, ٢٠٢٠. «Self-Optimisation of Dense Neural Network Architectures: An Incremental Approach.» ٢٠٢٠ International Joint Conference on Neural Networks (IJCNN). https://doi.org/١٠.١١٠٩/IJCNN٤٨٦٠٥.٢٠٢٠.٩٢٠٧٤١٦.

135. Zhenhao Shuai 等人,2023 年。“一种用于构建不同类型深度神经网络架构的自适应神经进化方法。” https://arxiv.org/abs/2211.14753

135 Zhenhao Shuai, et al., 2023. “A Self-adaptive Neuroevolution Approach to Constructing Deep Neural Network Architectures Across Different Types.” https://arxiv.org/abs/2211.14753

136 Pascal Bornet、Ian Barkin 和 Jochen Wirtz,2020 年。“智能自动化:学习如何利用人工智能来促进业务发展并使我们的世界更人性化”, https://www.amazon.com/INTELLIGENT-AUTOMATION-Artificial-Intelligence-business/dp/B08KTDVHHQ

136 Pascal Bornet, Ian Barkin, and Jochen Wirtz, 2020. “Intelligent Automation: Learn how to harness Artificial Intelligence to boost business & make our world more human,” https://www.amazon.com/INTELLIGENT-AUTOMATION-Artificial-Intelligence-business/dp/B08KTDVHHQ

137 Galileo AI,“Agent Leaderboard”,https: //huggingface.co/spaces/galileo-ai/agent-leaderboard

137 Galileo AI, n.d. “Agent Leaderboard,” https://huggingface.co/spaces/galileo-ai/agent-leaderboard

138 Karthik Narasimhan,2024 年。“人工智能代理基准测试”, https://sierra.ai/blog/benchmarking-ai-agents

138 Karthik Narasimhan, 2024. “Benchmarking AI Agents,” https://sierra.ai/blog/benchmarking-ai-agents

139 OpenAI,“生产最佳实践”, https://platform.openai.com/docs/guides/production-best-practices

139 OpenAI, n.d. “Production Best Practices,” https://platform.openai.com/docs/guides/production-best-practices

140 Google,“API 和参考”, https://cloud.google.com/generative-ai-app-builder/docs/apis

140 Google, n.d. “APIs and reference,” https://cloud.google.com/generative-ai-app-builder/docs/apis

141 Maira Ladeira Tanke 等人,2024 年。“使用 Amazon Bedrock Agents 构建稳健的生成式 AI 应用程序的最佳实践 -第 1 部分”, https://aws.amazon.com/blogs/machine-learning/best-practices-for-building-robust-generative-ai-applications-with-amazon-bedrock-agents-part-1/

141 Maira Ladeira Tanke, et al., 2024. “Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1,” https://aws.amazon.com/blogs/machine-learning/best-practices-for-building-robust-generative-ai-applications-with-amazon-bedrock-agents-part-1/

142 Weaviate,2025。https ://weaviate.io

142 Weaviate, 2025. https://weaviate.io

143 Pinecone,2025。https ://www.pinecone.io/product/

143 Pinecone, 2025. https://www.pinecone.io/product/

144斯坦福自主代理实验室,2025。https ://www.autonomousagents.stanford.edu

144 Stanford Autonomous Agents Lab, 2025. https://www.autonomousagents.stanford.edu

145 Michelle Pokrass,2024 年。“在 API 中引入结构化输出”, https://openai.com/index/introducing-structured-outputs-in-the-api/

145 Michelle Pokrass, 2024. “Introducing Structured Outputs in the API,” https://openai.com/index/introducing-structured-outputs-in-the-api/

146 Stephen Collins,2024 年。“引入 JSON Schema 以实现 AI 数据完整性”, https://stephencollins.tech/posts/introducing-json-schemas-for-ai-data-integrity

146 Stephen Collins, 2024. “Introducing JSON Schemas for AI Data Integrity,” https://stephencollins.tech/posts/introducing-json-schemas-for-ai-data-integrity

147 OpenAI,“安全最佳实践”, https://platform.openai.com/docs/guides/safety-best-practices

147 OpenAI, n.d. “Safety Best Practices,” https://platform.openai.com/docs/guides/safety-best-practices

148 Camunda,“BPMN 工作流引擎”, https://camunda.com/platform-7/workflow-engine/

148 Camunda, n.d. “BPMN Workflow Engine,” https://camunda.com/platform-7/workflow-engine/

149 James Beswick,2024 年。“操作 Lambda:理解事件驱动架构 -第 1 部分”, https://aws.amazon.com/blogs/compute/operating-lambda-understanding-event-driven-architecture-part-1/

149 James Beswick, 2024. “Operating Lambda: Understanding event-driven architecture – Part 1,” https://aws.amazon.com/blogs/compute/operating-lambda-understanding-event-driven-architecture-part-1/

150 Observe,《GCP 云函数》, https://docs.observeinc.com/en/latest/content/integrations/gcp/cloud-functions.html

150 Observe, n.d. “GCP Cloud Functions,” https://docs.observeinc.com/en/latest/content/integrations/gcp/cloud-functions.html

151 Microsoft,2022 年。“Microsoft 负责任人工智能标准 v2:通用要求”, https://blogs.microsoft.com/wp-content/uploads/prod/sites/5/2022/06/Microsoft-Responsible-AI-Standard-v2-General-Requirements-3.pdf

151 Microsoft, 2022. “Microsoft Responsible AI Standard v2: General Requirements,” https://blogs.microsoft.com/wp-content/uploads/prod/sites/5/2022/06/Microsoft-Responsible-AI-Standard-v2-General-Requirements-3.pdf

152 OpenAI,“安全最佳实践”, https://platform.openai.com/docs/guides/safety-best-practices

152 OpenAI, n.d. “Safety Best Practices,” https://platform.openai.com/docs/guides/safety-best-practices

153 Glean,2024 年。“2024 年信息检索综合指南”, https://www.glean.com/blog/glean-information-retrieval-2024

153 Glean, 2024. “A Comprehensive Guide to Information Retrieval in 2024,” https://www.glean.com/blog/glean-information-retrieval-2024

154 位维基百科贡献者,2025 年。“决策树学习”, https://en.wikipedia.org/wiki/Decision_tree_learning

154 Wikipedia contributors, 2025. “Decision tree learning,” https://en.wikipedia.org/wiki/Decision_tree_learning

155 Mage,2024。“机器学习 (ML) 应用:排名”, https://dev.to/mage_ai/machine-learning-ml-applications-ranking-238d

155 Mage, 2024. “Machine Learning (ML) Applications: Ranking,” https://dev.to/mage_ai/machine-learning-ml-applications-ranking-238d

156 Aman Anand Rai,2023 年。“6 个用于人工智能透明化的可解释人工智能 (XAI) 框架”, https://dev.to/amananandrai/6-explainable-ai-xai-frameworks-for-transparency-in-ai-3koj

156 Aman Anand Rai, 2023. “6 Explainable AI (XAI) Frameworks for Transparency in AI,” https://dev.to/amananandrai/6-explainable-ai-xai-frameworks-for-transparency-in-ai-3koj

157 位EUAIACT 贡献者,2025 年。“关键问题透明度义务”, https://www.euaiact.com/key-issue/5

157 EUAIACT contributors, 2025. “Key Issues Transparency Obligations,” https://www.euaiact.com/key-issue/5

158 API7.ai,2024 年。“API7 企业版 3.2.2 新增功能:审计日志记录”, https://api7.ai/blog/api7-3.2.2-audit-logging

158 API7.ai, 2024. “What’s New in API7 Enterprise 3.2.2: Audit Logging,” https://api7.ai/blog/api7-3.2.2-audit-logging

159 Vrushank Vyas,2025 年。“超越实施:为什么审计日志对企业 AI 治理至关重要”, https://portkey.ai/blog/beyond-implementation-why-audit-logs-are-critical-for-enterprise-ai-governance/

159 Vrushank Vyas, 2025. “Beyond Implementation: Why Audit Logs Are Critical for Enterprise AI Governance,” https://portkey.ai/blog/beyond-implementation-why-audit-logs-are-critical-for-enterprise-ai-governance/

160 位Langchain 贡献者,2025 年。“测试” https://python.langchain.com/docs/concepts/testing/

160 Langchain contributors, 2025. «Testing,» https://python.langchain.com/docs/concepts/testing/

161 Aaditya Ura、Pasquale Minervini、Clémentine Fourrier,2024 年。“开放医学语言模型排行榜:医疗保健领域大型语言模型的基准测试”, https://huggingface.co/blog/leaderboard-medicalllm

161 Aaditya Ura, Pasquale Minervini, Clémentine Fourrier, 2024. “The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare,” https://huggingface.co/blog/leaderboard-medicalllm

162 Lama Ahmad 等人,2024 年。“OpenAI 的外部红队方法”, https://cdn.openai.com/papers/openais-approach-to-external-red-teaming.pdf

162 Lama Ahmad et al., 2024. “OpenAI’s Approach to External Red Teaming,” https://cdn.openai.com/papers/openais-approach-to-external-red-teaming.pdf

Langchain 共有163 位贡献者,截至 2025 年。“LangSmith” https://www.langchain.com/langsmith

163 Langchain contributors, 2025. «LangSmith,» https://www.langchain.com/langsmith

164 Drew Robbins、Liudmila Molkova,2024 年。“用于生成式人工智能的 OpenTelemetry”, https://opentelemetry.io/blog/2024/otel-generative-ai/

164 Drew Robbins, Liudmila Molkova, 2024. “OpenTelemetry for Generative AI,” https://opentelemetry.io/blog/2024/otel-generative-ai/

165 Amazon Web Services,2025 年。https://aws.amazon.com/autoscaling/features/

165 Amazon Web Services, 2025. https://aws.amazon.com/autoscaling/features/

166 Google Cloud,2025 年。“AI 基础设施”, https://cloud.google.com/ai-infrastructure?hl =en

166 Google Cloud, 2025. «AI Infrastructure,» https://cloud.google.com/ai-infrastructure?hl=en

167 NVIDIA,2025。“安装 AI 和数据科学应用程序及框架”, https://docs.nvidia.com/ai-enterprise/deployment/bare-metal/latest/installing-ai.html

167 NVIDIA, 2025. “Installing AI and Data Science Applications and Frameworks,” https://docs.nvidia.com/ai-enterprise/deployment/bare-metal/latest/installing-ai.html

168 位维基百科贡献者,2025 年。“图灵测试”, https://en.wikipedia.org/wiki/Turing_test

168 Wikipedia contributors, 2025. “Turing test,” https://en.wikipedia.org/wiki/Turing_test

169 Enso,2025。“Enso”, https://enso.bot

169 Enso, 2025. “Enso,” https://enso.bot

170 Fiverr,2025。“Fiverr”, https://www.fiverr.com/go

170 Fiverr, 2025. “Fiverr,” https://www.fiverr.com/go

171 Taskade,2025 年。“Taskade AI”, https://www.taskade.com/ai/app

171 Taskade, 2025. “Taskade AI,” https://www.taskade.com/ai/app

172 Joel Khalili,2024 年。“将令人震惊的梗图变成数百万加密货币的尖端人工智能”, https://www.wired.com/story/truth-terminal-goatse-crypto-millionaire ?utm_source=chatgpt.com

172 Joel Khalili, 2024. “The Edgelord AI That Turned a Shock Meme Into Millions in Crypto,” https://www.wired.com/story/truth-terminal-goatse-crypto-millionaire?utm_source=chatgpt.com

173 Jose Antonio Lanz,2024 年。“Marc Andreessen 通过 Twitter 向 AI 机器人发送 5 万美元比特币”, https://decrypt.co/239340/marc-andreessen-sends-50k-in-bitcoin-to-an-ai-bot-on-twitter ?utm_source=chatgpt.com

173 Jose Antonio Lanz, 2024. “Marc Andreessen Sends $50K in Bitcoin to an AI Bot on Twitter,” https://decrypt.co/239340/marc-andreessen-sends-50k-in-bitcoin-to-an-ai-bot-on-twitter?utm_source=chatgpt.com

174 位Project Reylo 贡献者,2024 年。“真理的终点:成为加密货币百万富翁的人工智能”, https://www.projectreylo.com/post/terminal-of-truths-the-ai-that-became-a-crypto-millionaire ?utm_source=chatgpt.com

174 Project Reylo contributors, 2024. “Terminal of Truths: The AI That Became a Crypto Millionaire,” https://www.projectreylo.com/post/terminal-of-truths-the-ai-that-became-a-crypto-millionaire?utm_source=chatgpt.com

175 Joel Khalili,2024 年。“将令人震惊的梗图变成数百万加密货币的边缘人工智能”, https://www.wired.com/story/truth-terminal-goatse-crypto-millionaire/? utm_source=chatgpt.com

175 Joel Khalili, 2024. “The Edgelord AI That Turned a Shock Meme Into Millions in Crypto,” https://www.wired.com/story/truth-terminal-goatse-crypto-millionaire/?utm_source=chatgpt.com

176 David De Cremer,2024 年。“人工智能转型需要全员参与:让普通员工参与人工智能应用可提高整体绩效”,《哈佛商业评论》,5-6 月,第 124-131 页。

176 David De Cremer, 2024. “AI Transformation Requires a Total Team Effort: Including Rank-and-File Employees in AI Adoption Improves Overall Performance,” Harvard Business Review, May-June, pp. 124-131.

177 Ben Dickson,2022 年。“人工智能的 J 型曲线和即将到来的生产力繁荣”, https://bdtechtalks.com/2022/01/31/ai-productivity-j-curve/

177 Ben Dickson, 2022. “AI’s J-curve and upcoming productivity boom,” https://bdtechtalks.com/2022/01/31/ai-productivity-j-curve/

178 Tom Relihan,2019 年。“人工智能生产力风暴前的平静”, https://mitsloan.mit.edu/ideas-made-to-matter/a-calm-ai-productivity-storm

178 Tom Relihan, 2019. “A calm before the AI productivity storm,” https://mitsloan.mit.edu/ideas-made-to-matter/a-calm-ai-productivity-storm

179 David De Cremer,2024 年。“精通人工智能的领导者:九种方法重新掌控局面,让人工智能发挥作用。”

179 David De Cremer, 2024. “The AI-Savvy Leader: Nine Ways to Take Back Control and Make AI Work.”

180 Klarna,2024。“Klarna AI 助手在第一个月处理了三分之二的客户服务聊天”, https://www.klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/

180 Klarna, 2024. “Klarna AI assistant handles two-thirds of customer service chats in its first month,” https://www.klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/

181尤金·曼德尔,2024 年。“我们的人工智能主管测试了 Klarna 的聊天机器人”, https://loris.ai/blog/our-head-of-ai-puts-klarnas-chatbot-to-the-test/

181 Eugene Mandel, 2024. “Our Head of AI Puts Klarna’s Chatbot to the Test,” https://loris.ai/blog/our-head-of-ai-puts-klarnas-chatbot-to-the-test/

182 Ryan Hogg,2024 年。“Klarna 希望人工智能能使 1800 名员工过时”, https://fortune.com/europe/2024/08/28/klarna-1800-employees-ai-replace-ipo/

182 Ryan Hogg, 2024. “Klarna Has 1,800 Employees It Hopes AI Will Render Obsolete,” https://fortune.com/europe/2024/08/28/klarna-1800-employees-ai-replace-ipo/

183 Rachel Konyefa Dickson,2023。“传统领导理论分析:当代领导方法与管理效能综述”,《信息与知识管理》,13(5): 9-21。https ://www.iiste.org/Journals/index.php/IKM/article/viewFile/61330/63314

183 Rachel Konyefa Dickson, 2023. “Analysis of The Traditional Leadership Theories: A Review of Contemporary Leadership Approaches and Management Effectiveness,” Information and Knowledge Management, 13(5): 9-21. https://www.iiste.org/Journals/index.php/IKM/article/viewFile/61330/63314

184 SSRN 电子期刊,2023 年。“传统领导方式的理论评估”, https://papers.ssrn.com/sol3/papers.cfm?abstract_id =4297450

184 SSRN Electronic Journal, 2023. “A Theoretical Evaluation on Traditional Leadership Approaches,” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4297450

185 Wen Duan 等人,2024 年。“理解人机团队中信任随时间演变的过程”,ACM 人机交互会议论文集,8(CSCW2),第 521 条。https ://doi.org/10.1145/3687060

185 Wen Duan et al., 2024. “Understanding the Evolvement of Trust Over Time within Human-AI Teams,” Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), Article 521. https://doi.org/10.1145/3687060

186 Kevin Anthony Hoff 和 Masooda Bashir,2015 年。“自动化信任:整合影响信任因素的实证证据”,《人类因素》,57(3),407–434。https ://doi.org/10.1177/0018720814547570

186 Kevin Anthony Hoff and Masooda Bashir, 2015. “Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust,” Human Factors, 57(3), 407–434. https://doi.org/10.1177/0018720814547570

187 David De Cremer,2024 年。“精通人工智能的领导者:九种方法重新掌控局面,让人工智能发挥作用。”

187 David De Cremer, 2024. “The AI-Savvy Leader: Nine Ways to Take Back Control and Make AI Work.”

188 Adam Gleave 和 Euan McLean,2023 年。“脆弱的机器学习系统世界中的人工智能安全”, https://www.alignmentforum.org/posts/ncsxcf8CkDveXBCrA/ai-safety-in-a-world-of-vulnerable-machine-learning-systems-1

188 Adam Gleave and Euan McLean, 2023. “AI Safety in a World of Vulnerable Machine Learning Systems,” https://www.alignmentforum.org/posts/ncsxcf8CkDveXBCrA/ai-safety-in-a-world-of-vulnerable-machine-learning-systems-1

189 T. Saritha 和 P. Akthar,“混合工作模式对员工福祉和敬业度的影响”,《应用非线性分析通讯》,31,第 5s 期(2024 年): https://internationalpubls.com/index.php/cana/article/download/1003/707/1856

189 T. Saritha and P. Akthar, “The Impact of Hybrid Work Models on Employee Well-being and Engagement,” Communications on Applied Nonlinear Analysis, 31, no. 5s (2024): https://internationalpubls.com/index.php/cana/article/download/1003/707/1856

190 David De Cremer 和 Garry Kasparov,2021 年。“人工智能应该增强人类智能,而不是取代它”, https://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it

190 David De Cremer and Garry Kasparov, 2021. “AI Should Augment Human Intelligence, Not Replace It,” https://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it

191麦肯锡公司,2024。“企业职能部门中的人工智能:超越效率提升”, https://www.mckinsey.com/capabilities/operations/our-insights/gen-ai-in-corporate-functions-looking-beyond-efficiency-gains

191 McKinsey & Company, 2024. “Gen AI in Corporate Functions: Looking Beyond Efficiency Gains,” https://www.mckinsey.com/capabilities/operations/our-insights/gen-ai-in-corporate-functions-looking-beyond-efficiency-gains

192 Thomas Davenport 和 Randy Bean,2023 年。“联合利华的人工智能伦理:从政策到流程”, https://sloanreview.mit.edu/article/ai-ethics-at-unilever-from-policy-to-process/

192 Thomas Davenport and Randy Bean, 2023. “AI Ethics at Unilever: From Policy to Process,” https://sloanreview.mit.edu/article/ai-ethics-at-unilever-from-policy-to-process/

预计个人人工智能礼宾服务将在许多服务行业得到应用,参见 Stephanie Liu、Khadija Ali Vakeel、Nicholas Smith、Roya Sadat Alavipour、Chunhao (Victor) Wei、Jochen Wirtz (2024),“客户旅程中的人工智能礼宾服务:它是什么以及如何为客户增加价值?”服务管理杂志》,第 35 卷,第 6 期,136-158 页, https://doi.org/10.1108/JOSM-12-2023-0523。

193 Personal AI Concierges are expected to be implemented across many service sectors, see Stephanie Liu, Khadija Ali Vakeel, Nicholas Smith, Roya Sadat Alavipour, Chunhao (Victor) Wei, Jochen Wirtz (2024), “AI Concierge in the Customer Journey: What Is It and How Can It Add Value to the Customer?” Journal of Service Management, Vol. 35, No. 6, 136-158, https://doi.org/10.1108/JOSM-12-2023-0523.

194 Pascal Bornet,2024 年。“不可替代:人工智能时代脱颖而出的艺术”, https://irreplaceable.ai/

194 Pascal Bornet, 2024. “IRREPLACEABLE: The Art of Standing Out in the Age of Artificial Intelligence,” https://irreplaceable.ai/

195 Thomas H. Davenport 和 Julia Kirby,2016 年。“只有人类才能申请:智能机器时代的赢家和输家”, https://www.amazon.com/Only-Humans-Need-Apply-Machines/dp/0062438611

195 Thomas H. Davenport and Julia Kirby, 2016. “Only Humans Need Apply: Winners and Losers in the Age of Smart Machines,” https://www.amazon.com/Only-Humans-Need-Apply-Machines/dp/0062438611

196 Pascal Bornet,2024 年。“不可替代:人工智能时代脱颖而出的艺术”, https://irreplaceable.ai/

196 Pascal Bornet, 2024. “IRREPLACEABLE: The Art of Standing Out in the Age of Artificial Intelligence,” https://irreplaceable.ai/

197 Pasi Sahlberg,2021 年。“芬兰的经验教训 3.0:世界可以从芬兰的教育变革中学到什么?”

197 Pasi Sahlberg, 2021. “Finnish Lessons 3.0: What Can the World Learn from Educational Change in Finland?”

198 Alessia Lalomia 和 Antonia Cascales-Martínez,2023 年。“社交情感技能发展:丹麦学校项目的设计”, https://doi.org/10.18662/rrem/15.2/726

198 Alessia Lalomia and Antonia Cascales-Martínez, 2023. “Social-emotional Skills Development: The Design of a Project in a Danish School,” https://doi.org/10.18662/rrem/15.2/726

199新加坡技能创前程局,2024 年。“2024 年年度报告:提升技能生态系统”

199 SkillsFuture Singapore Agency, 2024. “Annual Report 2024: Levelling-up the Skills Ecosystem”

200 Joseph E. Aoun,2017 年。“机器人防护:人工智能时代的高等教育”

200 Joseph E. Aoun, 2017. “Robot-Proof: Higher Education in the Age of Artificial Intelligence”

201盖洛普,2024。“2024年全球职场状况报告”, https://www.gallup.com/workplace/349484/state-of-the-global-workplace.aspx

201 Gallup, 2024. “State of the Global Workplace: 2024 Report,” https://www.gallup.com/workplace/349484/state-of-the-global-workplace.aspx

202美国劳工统计局,2024 年。“簿记、会计和审计文员”,《职业展望手册》。https ://www.bls.gov/ooh/Office-and-Administrative-Support/Bookkeeping-accounting-and-auditing-clerks.htm

202 U.S. Bureau of Labor Statistics, 2024. “Bookkeeping, Accounting, and Auditing Clerks,” Occupational Outlook Handbook. https://www.bls.gov/ooh/Office-and-Administrative-Support/Bookkeeping-accounting-and-auditing-clerks.htm

203国际劳工组织 (ILO),2023 年。“近 300 万人死于与工作相关的事故和疾病”, https://www.ilo.org/resource/news/nearly-3-million-people-die-work-related-accidents-and-diseases

203 ILO (International Labour Organization), 2023. “Nearly 3 million people die of work-related accidents and diseases,” https://www.ilo.org/resource/news/nearly-3-million-people-die-work-related-accidents-and-diseases

204 Dietmar Elsler、Jukka Takala 和 Jouko Remes,2019 年。“与工作相关的事故和疾病成本的国际比较”,欧洲职业安全与健康署 (EU-OSHA)。https ://osha.europa.eu/sites/default/files/2021-11/international_comparison-of_costs_work_related_accidents.pdf

204 Dietmar Elsler, Jukka Takala, and Jouko Remes, 2019. “An International Comparison of the Cost of Work-Related Accidents and Illnesses,” European Agency for Safety and Health at Work (EU-OSHA). https://osha.europa.eu/sites/default/files/2021-11/international_comparison-of_costs_work_related_accidents.pdf

205世界卫生组织 (WHO),2023 年。“道路交通事故伤害”, https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries

205 World Health Organization (WHO), 2023. “Road Traffic Injuries,” https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries

206本·奈特,2023 年。“全球冲突:21 世纪死亡人数最高”, https://www.dw.com/en/global-conflicts-death-toll-at-highest-in-21st-century/a-66047287

206 Ben Knight, 2023. “Global Conflicts: Death Toll at Highest in 21st Century,” https://www.dw.com/en/global-conflicts-death-toll-at-highest-in-21st-century/a-66047287

207 约翰·梅纳德·凯恩斯,1930 年。“我们子孙后代的经济前景”, https://www.aspeninstitute.org/wp-content/uploads/files/content/upload/Intro_Session1.pdf

207 John Maynard Keynes, 1930. “Economic Possibilities for Our Grandchildre,” https://www.aspeninstitute.org/wp-content/uploads/files/content/upload/Intro_Session1.pdf

208 Aimee Picchi,2019 年。“亿万富翁马云,12 小时工作制的倡导者,现在表示人工智能将允许每周工作 12 小时”, https://www.cbsnews.com/news/billionaire-jack-ma-booster-of-12-hour-days-now-says-ai-will-allow-12-hour-weeks/

208 Aimee Picchi, 2019. “Billionaire Jack Ma, booster of 12-hour days, now says AI will allow 12-hour weeks,” https://www.cbsnews.com/news/billionaire-jack-ma-booster-of-12-hour-days-now-says-ai-will-allow-12-hour-weeks/

2023年,艾斯琳·墨菲 (Aislinn Murphy)写道: “比尔·盖茨称,人工智能的应用可能导致每周三天工作制。” https://www.foxbusiness.com/technology/bill-gates-suggests-artificial-intelligence-could-potentially-bring-three-day-work-week

209 Aislinn Murphy, 2023. “Bill Gates says using AI could lead to 3-day work week,” https://www.foxbusiness.com/technology/bill-gates-suggests-artificial-intelligence-could-potentially-bring-three-day-work-week

210 Aimee Picchi,2018 年。“当一家公司支付员工每周工作 4 天的工资时发生了什么”, https://www.cbsnews.com/news/one-business-says-a-4-day-week-with-pay-for-5-works/

210 Aimee Picchi, 2018. “What happened when a company paid its workers to work a 4-day week,” https://www.cbsnews.com/news/one-business-says-a-4-day-week-with-pay-for-5-works/

211 Annabelle Timsit,2023 年。“四天工作制试点非常成功,大多数公司表示不会再恢复原先的工作制。” https://www.washingtonpost.com/wellness/2023/02/21/four-day-work-week-results-uk/

211 Annabelle Timsit, 2023. “A four-day workweek pilot was so successful most firms say they won’t go back,” https://www.washingtonpost.com/wellness/2023/02/21/four-day-work-week-results-uk/

212 James Wright,2023 年。“日本老年护理自动化实验内幕”, https://www.technologyreview.com/2023/01/09/1065135/japan-automating-eldercare-robots/

212 James Wright, 2023. “Inside Japan’s experiment in automating eldercare,” https://www.technologyreview.com/2023/01/09/1065135/japan-automating-eldercare-robots/

213 David Graeber,2018 年。“狗屁工作:一种理论”, https://en.wikipedia.org/wiki/Bullshit_Jobs

213 David Graeber, 2018. “Bullshit Jobs: A Theory,” https://en.wikipedia.org/wiki/Bullshit_Jobs

214 Andy Beckett,2018 年。“后工作:一个没有工作的世界的激进理念”, https://www.greeneuropeanjournal.eu/post-work-the-radical-idea-of-a-world-without-jobs/

214 Andy Beckett, 2018. “Post-work: The Radical Idea of a World Without Jobs,” https://www.greeneuropeanjournal.eu/post-work-the-radical-idea-of-a-world-without-jobs/

215 Visier,2023。“人工智能对就业和节省时间的影响的新研究让员工意见分歧”, https://www.visier.com/blog/ai-impact-on-jobs-employees-divided/

215 Visier, 2023. “New Research on AI’s Impact on Jobs, Time Saved Has Employees Divided,” https://www.visier.com/blog/ai-impact-on-jobs-employees-divided/

216 Adecco集团,2024年。“人工智能平均每天为员工节省一小时时间”, https://www.adeccogroup.com/our-group/media/press-releases/ai-saves-workers-an-average-of-one-hour-each-day

216 Adecco Group, 2024. “AI Saves Workers an Average of One Hour Each Day,” https://www.adeccogroup.com/our-group/media/press-releases/ai-saves-workers-an-average-of-one-hour-each-day

217 位维基百科贡献者,2025 年。“基本收入”, https://simple.wikipedia.org/wiki/Basic_income

217 Wikipedia contributors, 2025. “Basic Income,” https://simple.wikipedia.org/wiki/Basic_income

218 Karl Widerquist,2020 年。“全民基本收入的深远而持久的历史”, https://thereader.mitpress.mit.edu/the-deep-and-enduring-history-of-universal-basic-income/

218 Karl Widerquist, 2020. “The Deep and Enduring History of Universal Basic Income,” https://thereader.mitpress.mit.edu/the-deep-and-enduring-history-of-universal-basic-income/

219 Peter Jacobsen,2024 年。“第二份工作报告显示,领取保障收入的人往往工作时间较少”, https://fee.org/articles/a-second-working-paper-shows-that-people-who-receive-a-guaranteed-income-tend-to-work-less/

219 Peter Jacobsen, 2024. “A second working paper shows that people who receive a guaranteed income tend to work less,” https://fee.org/articles/a-second-working-paper-shows-that-people-who-receive-a-guaranteed-income-tend-to-work-less/

220近期关于企业数字责任 (CDR) 的研究主要关注人工智能的伦理、公平性(即偏见)和隐私问题。关于企业如何改进其 CDR 治理的建议,请参阅以下研究:(1) Werner Kunz 和 Jochen Wirtz (2024),“人工智能时代的企业数字责任 (CDR)——对互动营销的影响”,《互动营销研究杂志》,19 (1),31-37,。(2) Jochen Wirtz、Werner Kunz、Nicole Hartley 和 James Tarbit (2023),“服务型企业及其生态系统中的企业数字责任”,《服务研究杂志》,第 1 卷,第 1 期,1-10 页,https: //doi.org/10.1108/JRIM-06-2023-0176 。 26,第 2 期,173-190, https://doi.org/10.1177/10946705221130467。 (3) Lara Lobschat、Benjamin Müller、Felix Eggers、Laura Brandimarte、Sarah Diefenbach、Mirja Kroschke 和 Jochen Wirtz (2021),“企业数字化责任”,《商业研究杂志》,第 1 卷。 122(一月),第 875-888 页, https://doi.org/10.1016/j.jbusres.2019.10.006

220 Recent work on Corporate Digital Responsibility (CDR) deal with the ethical, fairness (i.e., biases), and privacy issues of AI. For prescriptions on how firms can improve their CDR governance see the following studies: (1) Werner Kunz and Jochen Wirtz (2024), “Corporate Digital Responsibility (CDR) in the Age of AI – Implications for Interactive Marketing”, Journal of Research in Interactive Marketing, 19 (1), 31-37, https://doi.org/10.1108/JRIM-06-2023-0176. (2) Jochen Wirtz, Werner Kunz, Nicole Hartley, and James Tarbit (2023), “Corporate Digital Responsibility in Service Firms and their Ecosystems”, Journal of Service Research, Vol. 26, No. 2, 173–190, https://doi.org/10.1177/10946705221130467. (3) Lara Lobschat, Benjamin Müller, Felix Eggers, Laura Brandimarte, Sarah Diefenbach, Mirja Kroschke and Jochen Wirtz (2021), “Corporate Digital Responsibility”, Journal of Business Research, Vol. 122 (January), pp. 875-888, https://doi.org/10.1016/j.jbusres.2019.10.006.

221 Lu Wang 等人,2025 年。“大型行动模型:从构思到实施”, https://arxiv.org/abs/2412.10047

221 Lu Wang, et al., 2025. “Large Action Models: From Inception to Implementation,” https://arxiv.org/abs/2412.10047